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iNTRonnrTmN, 

Technical F,>M 

The field of this invention concerns segment polarity genes and their uses. 
10 Background 

Segment polarity genes were discovered in flies as mutations which change 
the pattern of structures of the body segments. Mutations in the genes cause animals 
to develop the changed patterns on the surfaces of body segments, the changes 
affecting the pattern along the head to tail axis. For example, mutations in the gene 

15 patched cause each body segment to develop without the normal structures in the 
center of each segment. In their stead is a mirror image of the pattern normally 
found in the anterior segment. Thus cells in the center of the segment make the 
wrong structures, and point them in the wrong direction with reference to the over 
all head-to-tail polarity of the animal. About sixteen genes in the class are known. 

20 The encoded proteins include kinases, transcription factors, a cell junction protein, 
two secreted proteins called wingless (WG) and hedgehog (HH), a single 
transmembrane protein called patched (PTC), and some novel proteins not related to 
any known protein. All of these proteins are believed to work together in signaling 
pathways that inform cells about their neighbors in order to set cell fates and 

25 polarities. 
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Many of the segment polarity proteins of Drosophila and other invertebrates 
are closely related to vertebrate proteins, implying that the molecular mechanisms 
involved are ancient. Among the vertebrate proteins related to the fly genes are En- 
1 and -2, which act in vertebrate brain development and WNT-1 , which is also 

5 involved' in brain development, but was first found as the oncogene implicated in 
many cases of mouse breast cancer. In flies, the patched gene is transcribed into 
RNA in a complex and dynamic pattern in embryos, including fine transverse stripes 
in each body segment primordium. The encoded protein is predicted to contain 
many transmembrane domains. It has no significant similarity to any other known 

10 protein. Other proteins having large numbers of transmembrane domains include a 
variety of membrane receptors, channels through membranes and transporters 

through membranes. 

The hedgehog <HH) protein of flies has been shown to have at least three 
vertebrate relatives: Sonic hedgehog (Shh); Indian hedgehog, and Desert hedgehog. 
15 The Shh is expressed in a group of cells at the posterior of each developing limb 
bud. This is exactly the same group of cells found to have an important role in 
signaling polarity to the developing limb. The signal appears to be graded, with 
cells close to the posterior source of the signal forming posterior digits and other 
limb structures and cells farther from the signal source forming more anterior 
20 structures. It has been known for many years that transplantation of the signaling 
cells, a region of the limb bud known as the "zone of polarizing activity (ZPA)" has 
dramatic effects on limb patterning. Implanting a second ZPA anterior to the limb 
bud causes a limb to develop with posterior features replacing the anterior ones (in 
essence little fingers instead of thumbs). Shh has been found to be the long sought 
25 ZPA signal. Cultured cells making Shh protein (SHH), when implanted into the 
anterior limb bud region, have the same effect as an implanted ZPA. This 
establishes that Shh is clearly a critical trigger of posterior limb development. 

The factor in the ZPA has been thought for some time to be related to 
another important developmental signal that polarizes the developing spinal cord. 
30 The notochord, a rod of mesoderm that runs along the dorsal side of early vertebrate 
embryos, is a signal source that polarizes the neural tube along the dorsal-ventral 
axis. The signal causes the part of the neural tube nearest to the notochord to form 
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floor plate, a morphologically distinct part of the neural tube. The floor plate, in 
turn, sends out signals to the more dorsal parts of the neural tube to further 
determine cell fates. The ZPA was reported to have the same signaling effect as the 
notochord when transplanted to be adjacent to the neural tube, suggesting the ZPA 
5 makes the same signal as the notochord. In keeping with this view, Shh was found 
to be produced by notochord cells and floor plate cells. Tests of extra expression of 
Shh in mice led to the finding of extra expression of floor plate genes in cells which 
would not normally turn them on. Therefore Shh appears to be a component of the 
signal from notochord to floor plate and from floor plate to more dorsal parts of the 
10 neural tube. Besides limb and neural tubes, vertebrate hedgehog genes are also 
expressed in many other tissues including, but not limited to the peripheral nervous 
system, brain, lung, liver, kidney, tooth primordia, genitalia, and hindgut and 
foregut endoderm. 

PTC has been proposed as a receptor for HH protein based on genetic 
15 experiments in flies. A model for the relationship is that PTC acts through a largely 
unknown pathway to inactivate both its own transcription and the transcription of the 
wingless segment polarity gene. This model proposes that HH protein, secreted 
from adjacent cells, binds to the PTC receptor, inactivates it, and thereby prevents 
PTC from turning off its own transcription or that of wingless. A number of 
20 experiments have shown coordinate events between PTC and HH. 

Relevant T ftPfflf| ITr 

Descriptions of patched, by itself or its role with hedgehog may be found in 
Hooper and Scott, Cell 59, 751-765 (1989); Nakano et al., Nature, 341, 508-513 
(1989) (both of which also describes the sequence for Drosophila patched) Simcox 

25 et al., Development 107, 715-722 (1989); Hidalgo and Ingham, Development, 110, 
291-301 (1990); Phillips etal., Development, 110, 105-114 (1990); Sampedro and 
Guerrero, Nature 353, 187-190 (1991); Ingham et al., Nature 353, 184-187 (1991); 
and Taylor et al., Mechanisms of Development 42, 89-96 (1993). Discussions of ' 
the role of hedgehog include Riddle et al., Cell 75, 1401-1416 (1993); Echelard et 

30 al., Cell 75, 1417-1430 (1993); Krauss et al., Cell 75, 1431-1444 (1993); Tabata 
and Komberg, Cell 76, 89-102 (1994); Heemskerk & DiNardo, Cell 76, 449-460 
(1994); Relink et al., Cell 76, 761-775 (1994); and a short review article by 
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Ingham, Current Biology 4, 347-350 (1994). The sequence for the Drosophila 5' 
non-coding region was reported to the GenBank, accession number M28418, 
referred to in Hooper and Scott (1989), supra. See also, Forbes, et al. , 
Development 1993 Supplement 115-124. 

5 

fflTMMAB Y np ™*- TNVFNTIQN 
Methods for isolating patched genes, particularly mammalian patched genes, 
including the mouse and human patched genes, as well as invertebrate patched genes 
and sequences, are provided. The methods include identification of patched genes 
from other species, as well as members of the same family of proteins. The subject 
genes provide methods for producing the patched protein, where the genes and 
proteins may be used as probes for research, diagnosis, binding of hedgehog protein 
for its isolation and purification, gene therapy, as weU as other utilities. 

l5 fl ffT FF rtRSPRTP TTON OF TF P ™» A WINGS 

Fig. 1 is a graph having a restriction map of about lOkbp of the 5' region 
upstream from the initiation codon of Drosophila patched gene and bar graphs of 
constructs of truncated portions of the 5' region joined to P-galactosidase, where the 
constructs are introduced into fly cell lines for the production of embryos. The 

20 expression of P-gal in the embryos is indicated in the right-hand table during early 
and late development of the embryo. The greater the number of + 's. the more 
intense the staining. 

r ^rprPTifM QF sPFCTFTr FMBOPTMF.NTS 
25 Methods axe provided for identifying members of ^patched iptc) gene 

family from invertebrate and vertebrate, e.g. mammalian, species, as well as the 
entire cDNA sequence of the mouse and human patched gene. Also, sequences for 
invertebrate patched genes are provided. The patched gene encodes a 
transmembrane protein having a large number of transmembrane sequences. 
30 In identifying the mouse and human patched genes, primers were employed 

to move through the evolutionary tree from the known Drosophila ptc sequence. 
Two primers are employed from the Drosophila sequence with appropriate 
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restriction enzyme linkers to amplify portions of genomic DNA of a related 
invertebrate, such as mosquito. The sequences are selected from regions which are 
not likely to diverge over evolutionary time and are of low degeneracy. 
Conveniently, the regions are the N-terminal proximal sequence, generally within 
5 the first 1 .5kb, usually within the first lkb, of the coding portion of the cDNA, 
conveniently in the first hydrophilic loop of the protein. Employing the polymerase 
chain reaction (PCR) with the primers, a band can be obtained from mosquito 
genomic DNA. The band may then be amplified and used in turn as a probe. One 
may use this probe to probe a cDNA library from an organism in a different branch 
10 of the evolutionary tree, such as a butterfly. By screening the ubrary and 
identifying sequences which hybridize to the probe, a portion of the butterfly 
patched gene may be obtained. One or more of the resulting clones may then be 
used to rescreen the library to obtain an extended sequence, up to and including the 
entire coding region, as well as the non-coding 5'- and 3'-sequences. As 
15 appropriate, one may sequence all or a portion of the resulting cDNA coding 
sequence. 

One may then screen a genomic or cDNA library of a species higher in the 
evolutionary scale with appropriate probes from one or both of the prior sequences. 
Of particular interest is screening a genomic library, of a distantly related 

20 invertebrate, e.g. beetle, where one may use a combination of the sequences 
obtained from the previous two species, in this case, the Drosophila and the 
butterfly. By appropriate techniques, one may identify specific clones which bind to 
the probes, which may then be screened for cross hybridization with each of the 
probes individually. The resulting fragments may then be amplified, e.g. by 

25 subcloning. 

By having all or parts of the 4 different patched genes, in the presently 
illustrated example, Drosophila (fly), mosquito, butterfly and beetle, one can now 
compare ^patched genes for conserved sequences. Cells from an appropriate 
mammalian limb bud or other cells expressing patched, such as notochord, neural 
30 tube, gut, lung buds, or other tissue, particularly fetal tissue, may be employed for 
screening. Alternatively, adult tissue which produces patched may be employed for 
screening. Based on the consensus sequence available from the 4 other species, one 
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can develop probes where at each site at least 2 of the sequences have the same 
nucleotide and where the site varies that each species has a unique nucleotide, 
inosine may be used, which binds to alU nucleotides. 

Either PCR may be employed using primers or, if desired, a genomic library 
5 from an appropriate source may be probed. With PCR, one may use a cDNA 
library or use reverse transcriptase-PCR (RT-PCR), where mRNA is available from 
the tissue. Usually, where fetal tissue is employed, one will employ tissue from the 
first or second trimester, preferably the latter half of the first trimester or the second 
trimester, depending upon the particular host. The age and source of tissue wdl 
10 depend to a significant degree on the ability to surgically isolate the tissue based on 
its size, the level of expression of patched in the cells of the tissue, the accessibility 
of the tissue, the number of cells expressing patched and the like. The amount of 
tissue available should be large enough so as to provide for a sufficient amount of 
mRNA to be usefully transcribed and amplified. With mouse tissue, limb bud of 
15 from about 10 to 15 dpc (days post conception) may be employed. 

In the primers, the complementary binding sequence will usually be at least 
14 nucleotides, preferably at least about 17 nucleotides and usually not more than 
about 30 nucleotides. The primers may also include a restriction enzyme sequence 
for isolation and cloning. With RT-PCR, the mRNA may be enriched in accordance 
20 with known ways, reverse transcribed, Mowed by amplification with the 

appropriate primers. (Procedures employed for molecular cloning may be found in 
Molecular Cloning: A Laboratory Manual, Sambrook et al., eds., Cold Spring 
Harbor Laboratories, Cold Spring Harbor, NY, 1988). Particularly, the primers may 
conveniently come from the N-terminal proximal sequence or other conserved 
25 region, such as those sequences where at least five amino acids are conserved out of 
eight amino acids in three of the four sequences. This is illustrated by the sequences 
(SEQ ID NO: 11) ITTPLDCFWEG, (SEQ ID NO:12) LIVGG, and (SEQ ID NO:l3) 
PFFWEQY. Resulting PCR products of expected size are subcloned and may be 
sequenced if desired. 

30 The cloned PCR fragment may men be used as a probe to screen a cDNA 

Ubrary of mammalian tissue cells expressing patched, where hybridizing clones may 
be isolated under appropriate conditions of stringency. Again, the cDNA library 
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should come from tissue which expresses patched, which tissue will come within the 
limitations previously described. Clones which hybridize may be subcloned and 
rescreened. The hybridizing subclones may then be isolated and sequenced or may 
be further analyzed by employing RNA blots and in situ hybridizations in whole and 
5 sectioned embryos. Conveniently, a fragment of from about 0.5 to lkbp of the N- 
terminal coding region may be employed for the Northern blot. 

The mammalian gene may be sequenced and as described above, conserved 
regions identified and used as primers for investigating other species. The N- 
terminal proximal region, the C-terminal region or an intermediate region may be 
10 employed for the sequences, where the sequences will be selected having minimum 
degeneracy and the desired level of conservation over the probe sequence. 

The DNA sequence encoding PTC may be cDNA or genomic DNA or 
fragment thereof, particularly complete exons from the genomic DNA, may be 
isolated as the sequence substantially free of wild-type sequence from the 
15 chromosome, may be a 50 kbp fragment or smaller fragment, may be joined to 
heterologous or foreign DNA, which may be a single nucleotide, oligonucleotide of 
up to 50 bp, which may be a restriction site or other identifying DNA for use as a 
primer, probe or the like, or a nucleic acid of greater than 50 bp, where the nucleic 
acid may be a portion of a cloning or expression vector, comprise the regulatory 
20 regions of an expression cassette, or the like. The DNA may be isolated, purified 
being substantially free of proteins and other nucleic acids, be in solution, or the 
like. 

The subject gene may be employed for producing all or portions of the 
patched protein. The subject gene or fragment thereof, generally a fragment of at 

25 least 12 bp, usually at least 18 bp, may be introduced into an appropriate vector for 
extrachromosomal maintenance or for integration into the host. Fragments will 
usually be immediately joined at the 5* and/or 3* terminus to a nucleotide or 
sequence not found in the natural or wild-type gene, or joined to a label other than a 
nucleic acid sequence. For expression, an expression cassette may be employed, 

JO providing for a transcriptional and translational initiation region, which may be 
inducible or constitutive, the coding region under the transcriptional control of the 
transcriptional initiation region, and a transcriptional and translational termination 
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region. Various transcriptional initiation regions may be employed which are 
functional in the expression host. The peptide may be expressed in prokaryotes or 
eukaryotes in accordance with conventional ways, depending upon the purpose for 
expression. For large production of the protein, a unicellular organism or cells of a 
5 higher organism, e.g. eukaryotes such as vertebrates, particularly mammals, may be 
used as the expression host, such as E. coti, B, subtilis, S. cerevisiae, and the like. 
In many situations, it may be desirable to express the patched gene in a mammalian 
host, whereby The patched gene will be transported to the cellular membrane for 
various studies. The protein has two parts which provide for a total of six 
10 transmembrane regions, with a total of six extracellular loops, three for each part. 
The character of the protein has similarity to a transporter protein. The protein has 
two conserved glycosylation signal triads. 

The subject nucleic acid sequences may be modified for a number of 
purposes, particularly where they will be used intracellular^, for example, by being 
15 joined to a nucleic acid cleaving agent, e.g. a chelated metal ion, such as iron or 
chromium for cleavage of the gene; as an antisense sequence; or the like. 
Modifications may include replacing oxygen of the phosphate esters with sulfur or 
nitrogen, replacing the phosphate with phosphoramide, etc. 

With the availability of the protein in large amounts by employing an 
20 expression host, the protein may be isolated and purified in accordance with 

conventional ways. A lysate may be prepared of the expression host and the lysate 
purified using HPLC, exclusion chromatography, gel electrophoresis, affinity 
chromatography, or other purification technique. The purified protein will generally 
be at least about 80% pure, preferably at least about 90% pure, and may be up to 
25 100% pure. By pure is intended free of other proteins, as well as cellular debris. 

The polypeptide may be used for the production of antibodies, where short 
fragments provide for antibodies specific for the particular polypeptide, whereas 
larger fragments or the entire gene allow for the production of antibodies over the 
surface of the polypeptide or protein, where the protein may be in its natural 

30 conformation. 

Antibodies may be prepared in accordance with conventional ways, where 
the expressed polypeptide or protein may be used as an immunogen, by itself or 
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obtain further 5' sequence to ensure that one has at least a functional portion of the 
enhancer. It is found that the enhancer is proximal to the 5' coding region, a 
portion being in the transcribed sequence and downstream from the promoter 
sequences. The transcriptional initiation region may be used for many purposes, 
5 studying embryonic development, providing for regulated expression of patched 
protein or other protein of interest during embryonic development or thereafter, and 
in gene therapy. 

The gene may also be used for gene therapy, by transfection of the normal 
gene into embryonic stem cells or into mature cells. A wide variety of viral vectors 
10 can be employed for transfection and stable integration of the gene into the genome 
of the cells. Alternatively, micro-injection may be employed, fusion, or the like for 
introduction of genes into a suitable host cell. See, for example, Dhawan et al 
Science 254, 1509-1512 (1991) and Smith et al. , Molecular and Cellular Biology 
(1990) 3268-3271. 

15 By providing for the production of large amounts of PTC protein, one can 

use the protein for identifying ligands which bind to the PTC protein. Particularly, 
one may produce the protein in cells and employ the polysomes in columns for 
isolating ligands for the PTC protein. One may incorporate the PTC protein into 
liposomes by combining the protein with appropriate lipid surfactants, e.g. 
20 phospholipids, cholesterol, etc., and sonicate the mixture of the PTC protein and the 
surfactants in an aqueous medium. With one or more established ligands, e.g. 
hedgehog, one may use the PTC protein to screen for antagonists which inhibit the 
binding of the ligand. In this way, drugs may be identified which can prevent the 
transduction of signals by the PTC protein in normal or abnormal cells. 
25 The PTC protein, particularly binding fragments thereof, the gene encoding 

the protein, or fragments thereof, particularly fragments of at least about 18 
nucleotides, frequently of at least about 30 nucleotides and up to the entire gene, 
more particularly sequences associated with the hydrophilic loops, may be employed 
m a wide variety of assays. In these situations, the particular molecules will 
30 normally be joined to another molecule, serving as a label, where the label can 
directly or indirectly provide a detectable signal. Various labels include 
radioisotopes, fluoresce*, chemiluminescers. enzymes, specific binding molecules, 
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particles, e.g. magnetic particles, and the like. Specific binding molecules include 
pairs, such as biotin and streptavidin, digoxin and antidigoxin etc. For the specific 
binding members, the complementary member would normally be labeled with a 
molecule which provides for detection, in accordance with known procedures. The 
5 assays may be used for detecting the presence of molecules which bind to the 

patched gene or PTC protein, in isolating molecules which bind to the patched gene, 
for measuring the amount of patched, either as the protein or the message, for 
identifying molecules which may serve as agonists or antagonists, or the like. 

Various formats may be used in the assays. For example, mammalian or 
10 invertebrate cells may be designed where the cells respond when an agonist binds to 
PTC in the membrane of the cell. An expression cassette may be introduced into 
the cell, where the transcriptional initiation region of patched is joined to a marker 
gene, such as p-galactosidase, for which a substrate forming a blue dye is available. 
A 1 .5kb fragment that responds to PTC signaling has been identified and shown to 
15 regulate expression of a heterologous gene during embryonic development. When 
an agonist binds to the PTC protein, the cell will turn blue. By employing a 
competition between an agonist and a compound of interest, absence of blue color 
formation will indicate the presence of an antagonist. These assays are well known 
in the literature. Instead of cells, one may use the protein in a membrane 
20 environment and determine binding affinities of compounds. The PTC may be 
bound to a surface and a labeled ligand for PTC employed. A number of labels 
have been indicated previously. The candidate compound is added with the labeled 
ligand in an appropriate buffered medium to the surface bound PTC. After an 
incubation to ensure that binding has occurred, the surface may be washed free of 
25 any non-specifically bound components of the assay medium, particularly any non- 
specifically bound labeled ligand, and any label bound to the surface determined. 
Where the label is an enryme, substrate producing a detectable product may be used. 
The label may be detected and measured. By using standards, the binding affinity of 
the candidate compound may be determined. 
30 The availability of the gene and the protein allows for investigation of the 

development of the fetus and the wit patched and other molecules play in such 
development. By employing antisense sequences of the patched gene, where the 

12 
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sequences may be introduced in cells in culture, or a vector providing for 
transcription of the antisense of the patched geM introduced into the cells, one can 
investigate the role the PTC protein plays in the cellular development. By providing 
for the PTC protein or fragment thereof in a soluble form which can compete with 
5 the normal cellular PTC protein for ligand, one can inhibit the binding of ligands to 
the cellular PTC protein to see the effect of variation in concentration of ligands for 
the PTC protein on the cellular development of the host. Antibodies against PTC 
can also be used to block function, since PTC is exposed on the cell surface. 
The subject gene may also be used for preparing transgenic laboratory 
10 animals, which may serve to investigate embryonic development and the role the 
PTC protein plays in such development. By providing for variation in the 
expression of the PTC protein, employing different transcriptional initiation regions 
when may be constitutive or inducible, one can determine the developmental effect 
of the differences in PTC protein levels. Alternatively, one can use the DNA to 
15 knock out the PTC protein in embryonic stem cells, so as to produce hosts with only 
a angle functional parched gene or where the host lacks a functional patched gene 
By employing homologous recombination, one can introduce z patched gene, which 
» deferentially regulated, for example, is expressed to the development of the fetus 
but not in the adult. One may also provide for expression of the patched gene in 
20 cells or tissues where it is not normally expressed or at abnormal times of 
development. One may provide for mis^xpression or failure of expression in 
certain tissue to mimic a human disease. Thus, mouse models of spina bifida or 
abnormal motor neuron differentiation in the developing spinal cord are made 
available. In addition, by providing expression of PTC protein in cells in which it is 
25 otherwise not normally produced, one can induce changes in cell behavior upon 
bmcjng of ligand to the PTC protein. 

Areas of investigation may include the development of cancer treatments. 
The wingless gene, whose transcription is regulated in flies by PTC, is closely 
related to a mammalian oncogene, Wnt-1, a key factor in many cases of mouse 
30 breast cancer. Other Wnt family members, which are secreted signaling proteins 
are implicated in many aspects of development. In flies, the signaling factor 
decapentaplegic, a member of the TGF-beta family of signaling proteins, known to 
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affect grow* and dcvdopmen, in mammals, b also conned by PTC. Since 

h of bom the TGF-beta and Wn. families are expressed in mice ,n places 
members of both the iur wu «nnortunitv 

Cos. to overlapping with ^<W. «- «— ^ - ° P ^? 

jLng cancer. * -»* - »■— • T7 ^ 

by engineering eeUs of the g»"» « <*« — whOT ^ pr0Bm 
in earlier developmental stage or is expressed. 

Since Northern bio, analysis indica.es that,* is presen, a, htgh levels . 
M lung -issue, the region of p,c expression or binding to its 
my serve ,. inhibit prolifetaion o, cancerous lung cells. Tne ava.iabd.ty of fte 
gene encoding PTC and the expression of the gene allows for the development of 

Lerenu^ earl, in development. The avails of ft. gene allows for fte 
ZLft- of into host diseased tissue, stimulating ft. M program of 
^riordif— .Tniscou.bedonein^iuncionwifto^enes 

which provide for ft. Ugands which regular PTC acUvity or by p— - 
?n aeonists other than the natural Ugand. 

-nte avaUabuity o, the coding region for various p,c genes from various 
allows for the isofcdon of ft. 5- non^ding region comprising fte promo^r 
T!Lcer assodated with the , K genes, so as to provide transanal and post- 

autoregulated, activation of the pic gene wmroui 

autoregu -nscripnonal initiatjon region of the 

gene under the transcnpttonal control of the transenpuo 

L gene. Th. transcripdonal initiation region may be obtained from any host 

tam about 1 5 kb upstream ftom ft. UtH- oodon, up to about 10kb, prefer* y 
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the PTC protein, particularly the Drosophila 5'-non-coding region (GenBank 
accession no. M28418). 

^ The following examples are offered by illustration not by way of limitation. 

EXPFT? TKfFfflT ft j 

L PCR on Mnsnuito iAmiyhrln pnmhi nr ) fi r nnm j r ^ W A . 

PCR primers were based on amino acid stretches of fly PTC that were not 
10 likely to diverge over evolutionary time and were of low degeneracy. Two such 
pnmers (P2R1 (SEQ ID NO: 14): flQACQAAHCAARGTNCA YCAR YTNTGG 
P4R1: (SEQ ID NO:15) fiQACGAATJrCYTCCCARAARCANTC, (the 
underlined sequences are Eco RI linkers) amplified an appropriately sized band from 
mosquito genomic DNA using the PCR. The program conditions were as follows- 
15 94 °C 4 min.; 72 °C Add Taq; 

[49 'C 30 sec.; 72 «C 90 sec.; 94 "C 15 sec] 3 times 
[94 'C 15 sec.; 50 'C 30 sec.; 72 "C 90 sec] 35 times 
72 °C 10 min; 4 °C hold 

This band was subcloned into the EcoRV site of pBluescript U and sequenced using 
20 the USB Sequence kit. 



D - Screen nf h B.tm»rfiv mMA t ^ ^ pfF 

Using the mosquito PCR product (SEQ ID NO:7) as a probe, a 3 day 
embryonic Precis coema AgtlO cDNA library (generously provided by Sean 

25 Carrol,) was screened. Filters were hybridized at 65 'C overnight in a solution 
containing 5xSSC, 10% dextran sulfate, 5x Denhardfs, 200„g/ml sonicated 
salmon sperm DNA, and 0.5% SDS. Filters were washed in 0.1X SSC, 0.1 % SDS 
at room temperature several times to remove nonspecific hybridization Of the 
100,000 plaques initially screened, 2 overlapping clones, LI and L2, were isolated 

30 which corresponded to the N terminus of butterfly PTC. Using U as a probe the 
ubrary filters were reserved and 3 additional clones (L5, L7, L8) were isolated 
which encompassed the remainder of the ^ coding sequence. The full length 
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sequence of butterfly ,« (SEQ ID NO:3) was defined by ABI automtfed 

sequencing. 

m . -,■ , rrn nf , n^rt Gmnmir 1 ihrarv i ritn Mmrniim P^R FrmHa 

5 anil h r ^rmrnt ""* Wimi ' rflv ° alB 

A Agen.ll genomic library iron, M*. am» <l« °' Dmndl) 
was probed with a mixture of the mosquito PCR (SEQ ID NOT) product and 
BstXI/EcoR. fragment of L2. Paters were hybridize* a. 55 -C ovemgh. and 
w^ed as above. Of the 75.000 plaques greened, U clones were identified and the 
.0 Sac. fragment of T8 (SEQ ID NO:.), which crosshybridized with the mosqutto and 
butterfly probes, was subcloned into pBluescript 

IV. r1T nn M ~... i*nNA TT-inr lTf " Kffifll " 

r nmrr.-r rt in tlr 

« Two degenerate PCR primers (P4REV: (SEQ ID NO:16) 

jjjs^ce^n^GA^vrm m (seq id no-.it) 

rmr-*'"" »nmtnCWCC«IGC*T) were designed based on a 
comparison of PTC amino acid sequences from fly (DmopM* «to">«"»"> < SE Q 
ID NO-.6), mosquito (Mytato ,*mW«)(SEQ ID NOT), butterfly (Pas 
20 c«™ a )(SEQroNO:4),andbe^e(JWMi U mc aI «n C1 «.)(SEQroNO:2). I 

represents inosine, which can form base pairs with all four nucleotides. P22 was 
used to reverse transcribe RNA from .2.5 dpc mouse Umb bud (gift from Dav, 
Kingsley, for 90 min at 37 "C. PCR using P4REV(SEQ ID NO: .7) and P22(SEQ 
m NO: 18) was then performed on . »l of the resultant cDNA under the foUowmg 

25 conditions: 

94 °C 4 min.; 72 °C Add Taq; 
[94 -C 15 sec.; 50 °C 30 sec.; 72 °C 90 sec.] 35 times 
72 °C 10 min.; 4 °C hold 
PCR products of the expected size were subclone* into the TA vector (Invtoogen) 
30 and sequenced with the Sequenase Version 2.0 DNA Sequencing Ktt (U.S.I I.). 
Using the cloned mouse PCR fragment as a probe, 300,000 plaques of a 
_ 8.5 dpc Xgt.0 cDNA library (a gift from Brigid Hogan) were screened at 
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65 °C as above and washed in 2x SSC, 0. 1 % SDS at room temperature. 7 clones 
were isolated, and three (M2 M4, and M8) were subcloned into pBluescript II. 
200,000 plaques of this library were rescreened using first, a 1.1 kb EcoRI fragment 
from M2 to identify 6 clones (M9-M16) and secondly a mixed probe containing the 
5 most N terminal (Xhol fragment from M2) and most C terminal sequences 

(BamHI/Bgin fragment from M9) to isolate 5 clones (M17-M21). M 9, M10, MM, 
and M17-21 were subcloned into the EcoRI site of pBluescript n (Strategene). 

V. 

10 Northerns: 

A mouse embryonic Northern blot and an adult multiple tissue Northern blot 
(obtained from Clontech) were probed with a 900 bp EcoRI fragment from an N 
terminal coding region of mouse ptc. Hybridization was performed at 65 "C in 5x 
SSPE, lOx Denhardt's, 100 „g/ml sonicated salmon sperm DNA, and 2% SDS. 
15 After several short room temperature washes in 2x SSC, 0.05% SDS, the blots were 
washed at high stringency in 0.1X SSC, 0.1 % SDS at 50C. 
In situ hybridization of sections: 

7.75, 8.5, 11.5, and 13.5 dpc mouse embryos were dissected in PBS and 
frozen in Tissue-Tek medium at -80 'C. 12-16 m frozen sections were cut, 
20 collected onto VectaBond (Vector Laboratories) coated slides, and dried for 30-60 
minutes at room temperature. After a 10 minute fixation in 4% paraformaldehyde in 
PBS, the slides were washed 3 times for 3 minutes in PBS, acetylated for 10 minutes 
in 0.25% acetic anhydride in triethanolamine, and washed three more times for 5 
minutes in PBS. Prehybridization (50% formamide, 5X SSC, 250 jig/ml yeast 
25 tRNA, 500 ng/ml sonicated salmon sperm DNA, and 5x Denhardt's) was carried 
out for 6 hours at room temperature in 50% formamide/5x SSC humidified 
chambers. The probe, which consisted of 1 kb from the N-terminus of ptc, was 
added at a concentration of 200-1000 ng/ml into the same solution used for 
prehybridization, and then denatured for five minutes at 80 »C. Approximately 75 
30 M l of probe were added to each slide and covered with Parafilm. The slides were 
incubated overnight at 65 «C in the same humidified chamber used previously. The 
following day, the probe was washed successively in 5X SSC (5 minutes, 65 °Q, 
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0.2X SSC (1 hour, 65 °C), and 0.2X SSC (10 minutes, room temperature). After 
five minutes in buffer Bl (0.1M maleic acid, 0.15 M NaCl, pH 7.5), the slides were 
blocked for 1 hour at room temperature in 1 % blocking reagent (Boerhinger- 
Mannheim) in buffer Bl, and then incubated for 4 hours in buffer Bl containing the 

5 DIG-AP conjugated antibody (Boerhinger-Mannheim) at a 1 :5000 dilution. Excess 
antibody was removed during two 15 minute washes in buffer Bl, followed by five 
minutes in buffer B3 (100 mM Tris, lOOmM NaCl, 5mM MgCl,, pH 9.5). The 
antibody was detected by adding an alkaline phosphatase substrate (350 pi 75 mg/ml 
X-phosphate in DMF, 450 *d 50 mg/ml NBT in 70% DMF in 100 mis of buffer B3) 

10 and allowing the reaction to proceed over-night in the dark. After a brief rinse in 10 
mM Tris, ImM EDTA, pH 8.0, the slides were mounted with Aquamount (Lerner 
Laboratories). 

Vi. prmnphiln s- r rancz-riptinnal initiation region P-pal constructs . 
15 A series of constructs were designed that link different regions of the ptc 

promoter from Drosophila to a LacZ reporter gene in order to study the cis 
regulation of the ptc expression pattern. See Fig. 1. A I0.8kb BamHI/BspMl 
fragment comprising the 5'-non-coding region of the mRNA at its 3'-terminus was 
obtained and truncated by restriction enryme digestion as shown in Fig. 1. These 
20 expression cassettes were introduced into Drosophila lines using a P-element vector 
(Thummel et al. , Gene 74, 445-456 (1988), which were injected into embryos, 
providing flies which could be grown to produce embryos. (See Spradling and 
Rubin, Science (1982) 218, 341-347 for a description of the procedure.) The vector 
used a pUC8 background into which was introduced the white gene to provide for 
25 yellow eyes, portions of the P-element for integrtion, and the constructs were 
inserted into a polylinker upstream from the LacZ gene. The resulting embryos 
were stained using antibodies to LacZ protein conjugated to HRP and the embryos 
developed with OPD dye to identify the expression of the LacZ gene. The staining 
pattern is described in Fig. 1, indicating whether there was staining during the early 
30 and late development of the embryo. 



VII. T^latinn of a Mn»«f? p"" fiene 
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Homo.ogues of fly PTC (SEQ ID NO:6) were isolate! from three insects 
■nossuito, butterfly and beetle, „si„ g either PCR or low stringency library screens 
PCS pnmers to six amino acid stretches of pre of Iow muBIaW% Md 

weaned. P""- pair. P2 ar^ P4, ampiifled a, „o m „,o g ous fi^nen, of 
,»c from mosquito genomic DNA ma, corresponded to me fin, hydrophilic .oop of 
me profcin. The 345bp PCR product (SEQ ID N0: 7) was subc,o*d a* sequenced 
and when aligned u, fly PTC, showed 67% amino acid identity. 

The cloned mosquito fragment was used to screen a butterfly ACT 10 cDNA 
library. Of ,00,000 plaques screened, five overiapping Cones were isototed and 
usedtoobtain me fulHengd, coding sequence. The butterfly PTc homologue (SEQ 
N ° :4) " U " "»* »° -eral. has 50% amino acid identity (72% 

amilarity, » fly PTC. With me exception of a divergent Cterminus, .his homology 
« evenly spread across the coding sequence. The mosquito PCR clone (SEQ ID 
NO:7) and a corresp^ding fragrant of butterfly cDNA were used to screen a beetle 
Agenmgenomiclibrary. Of the piaques screened, U clones were identified A 
fragment of one clone (T8), which hybridized wim me original probes, was 
subcloned and sequenced. This 31* piece contains an 89 amino acid exon (SEQ tD 
NO.-2, „„*„ is 44, atxi 51 * iicmial „ fc rf 
butterfly PTC respectively. 

of >he PTC, two PCR primers were designed „ a five and six amino acid suetch 
wh 1 chwe re iden dc a lM d„f lowdegmeraev These primers were used u, isoU tt me 
mouse homotogue using RT-PCR on embryonic limb bud RNA. An appropriately 
sued band was amplified and upon Coning and sequencing, it was found to encode a 
25 protem a% identical „ „ v ^ Usmg ^ ^ pcR ^ ^ ^ 

fragments of mousey cDNA, , mouse embryonic AcDNA library was screened ' 
From abou, 300,000 plaques, 17 Cones were identified and of these 7 form 
overtopping cDNA's which comprise mos, of ,he protein-coding sequence (SEQ m 



30 



Vila. 



19 



PCT/US95/13233 

WO 96/11260 

kwc the otc orobe detects a single 

8k b message. , ow levels as early as 7 dpc and becomes 

Developmental!,, pc -™ » ^ is sUU preseot a, ,7 ope, the 

5 Northern blot indtcates a dear decrease ^ ^ ^ 

,e ad.^ ^ * ~ de Jed , ^ 

derate amounts in the Iddney and bver. Weakstg 

spleen, skeletal muscle, and testes. 

™. i» rim llirtTrifli??""" "f Tlln "" f ^ TT 

10 VTJb. iU-I"""''" 1 """ ~ . . 7 ape, while there is 

»o detectable si**! in secuons from 7.75 dpc J ^ ^ 

^.ained b, ft. low .eve. of transenpno, !n co «* , P ^ ^ 

^^nenra.axUofS.Sdpcem^.By'^^ ^ 

"^rZ^^carU^eofn.Sandn.dpcBmbbods.as 

Ttl Porton of ft. somites, a region which is prospect 
WeUaS lo^tXTormsbonein^evenebra, column. p,c is present in a 
20 sclerotome and eventually mesodernial ^ ectodermal origin 

wide range of tissues from ectodermal, mesoderm 
supportillg ta fundamental role in embryonic development. 

Li i) ftnr flint"" f"- ub 

. l0 > plaques from ahuman lung cDNA library 
To isolate human pre (hptc), i» pu"t 

-v — c^edwith a lVbp mouse pre fragment, MM. 
(HL3022a, Qonetech) were screened with P „ cv (60 - c in 5X SSC, 10* 

dextran sulfate, 5XDcnnaroii, * the inserts cloned into 

„, /Tji an a H2^ were isolated, tne insciw w 
SDS). Two positive plaques (HI and H2) ^ ^ 

n.ousep^homo.og. *^* 5 d 7"" ^ , (COTtatatae v unuans!a«d 
^ in duplicate wi* MM EcoR 1 and M2-3 Xho (co 
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sequence of mouse ptc) probes. Ten plaques were purified and of these, 6 inserts 
were subloned into pBluescript. To obtain the full coding sequence, H2 was fully 
and H14, H20, and H21 were partially sequenced. The 5.1kbp of human /;/c 
sequence (SEQ ID NO: 18) contains an open reading frame of 1447 amino acids 
5 (SEQ ID NO:19) that is 96% identical and 98% similar to mouse /vc. The 5' and 3' 
untranslated sequences of human pre (SEQ ID NO:18) are also highly similar to 
mouse pre (SEQ ID NO. 09) suggesting conserved regulatory sequence. 

Ix - Comparison of Moure. Human. Fiv an w R,. r t P r fiv Spg..™^ 
10 The deduced mouse PTC protein sequence (SEQ ID NO:10) has about 38% 
identical amino acids to fly PTC over about 1,200 amino acids. This amount of 
conservation is dispersed through much of the protein excepting the C-terminal 
region. The mouse protein also has a 50 amino acid insert relative to the fly 
protein. Based on the sequence conservation of PTC and the functional conservation 
15 of hedgehog between fly and mouse, one concludes that ptc functions similarly in 
the two organisms. A comparison of the amino acid sequences of mouse (mptc) 
(SEQ ID NO: 10), human (hptc) (SEQ ID NO: 19), butterfly (bptc)(SEQ ID NO:4) 
and drosophila (ptc) (SEQ ID NO:6) is shown in Table 1. 

TABLE 1 
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25 



30 



alignment of human, mouse, fly, and butterfly PTC homologs 

alignment of human, mouse, fly, an d butterfly ptc homologs 



HPTC 
MPTC 



MA^AGNAA^ GGGGSGCI GAPGRPAGGGRRRRTGGLRRAAAPDRDYLHR PS Y CDA 

mvapdseapsnprit^hespcaS- KK^~:S:SSS£ 



alalselekgnieggrtslwirawlqeqlfi lgcflqgdagkvlfvai lvlstfcvglks 
*• .* * ** .».*** *..* . *#**. 

***• *• ** . **.•** . ... . 

hptc ldsalc^vsrvhvymy>wqwiclehlcyksgelitet-gymdqiieyi.ypcliitpi 1 dcfwe 
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MPTC 

PTC 

BPTC 



HPTC 
MPTC 
PTC 
BPTC 



10 



HPTC 
MPTC 
PTC 
15 BPTC 



* *.**.*•.**■•* 

_ _ v _ v-t NYOVDSWEEMLNKAEV 
GAKLQSGTAyLLGKPPLR ^S"iKSiK--»--KINVQVDSW E EMI^V 



HPTC 
MPTC 

20 ptc 

BPTC 



HPTC 
25 MPTC 
PTC 
BPTC 



30 HPTC 
MPTC 
PTC 
BPTC 



35 



40 



HPTC 
MPTC 
PTC 
BPTC 



HPTC 
MPTC 
PTC 
45 BPTC 



HPTC 
MPTC 

50 ptc 

BPTC 



HPTC 
55 MPTC 
PTC 
BPTC 



60 HPTC 
MPTC 
PTC 
BPTC 



65 



;5sv^sTOK----vwi™^;^!2"!!yi„ 3VavTV LYArcTi.LiimDP 



RKI 




DIPGSS 
ENVTKT 



* 
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HPTC LHRSFSWKYVMLEENKgLPKMWIJlYFRDWLOGLQDAFDSDWETGKIMPNN-YKKGSDDG 

MPTC LHKS FSNVKYVMLEENKQLPQMff LHYFRDWLQGLQDAFDSDWETGR IMPNN- YKNGSDDG 

PTC YHDSFVRVPHVIKNDNGGLPDFWLLLFSEWLGNLQKIFDEEYRDGRLTKECWFPNASSDA 

BPTC YHDQFVRIPKIIKNDNGGLTKFWLSLFRDWLLDLQVAFDKEVASGCITQEYWC1CNA5DEG 

5 * * * * ** * # ** ** ** * . , *.+ 

rfPTC VLAYKLLVQTGSRDKPIDISQLTK-QRLVDADGIINPSAFYIYLTAWVSNDPVAYAASQA 

MPTC VLAYKLLVQTGSRPKPIDISQLTK-QRLVDADGIINPSAFYIYLTAWVSNDPVAYAASQA 

PTC ILAYKLIVQTGHVDNPVDKELVLT*NRLVNSDGIINQRAFYNY1»SAWATNDVFAYGASQG 

10 BPTC I LAYKLMVQTGHVDNPIDKS LITAGHRLVDKDGI INPKAFYNYLSAWATNDALAY GASQG 
********* *.*.* . .***. ***** *** **.** .** **.***. 

HPTC NIRPHRPEWVHDKADYMPETRIAIPAAEPIEYAQFPFYLNGIADTSDFVEAIEKVRTICS 

MPTC NIRPHRPEWVHDKADYMPETRLRIPAAEPIEYAQFPFYLNGLRDTSDFVEAIEKVRVICN 

15 PTC KLYPEPRQYFHQPNEY DLKI PKS LPLVYAQMP FYLHGLTDTSQ I KTLI GHI RD LS V 

BPTC NLKPQPQRWIHSPEDV HLEIKKSSPLIYTQLPFYLSGLSDTDSIKTLIRSVRDLCL 

+ * ***** **** ** ** , * . * . 

HPTC tTYTSLGLSSYPNGYPFLFWEQYIGLRHWLMiFISVVIACTFL^^ 

20 MPTC NYTSLGLSSYPNGYPFLFWEQYISLRHWLLI^ISWIiACTFLVCAVFLLNPWTAGIIVMV 

PTC KYE GFGLPNY PS GI PFI FWEQ YMT LRS S LAMI LACVLLAALVLVS LLLLS VWAAVLVT LS 

BPTC KYEAKGLPNFPS GI P FLFWEQY LY LRTS LLLALACALGAVFI AVMVLLLNAWAAVLVTIA 
* ** * * ** ***** ** * * ..**. *.* .* . 



25 HPTC iJtfKrVELFGMKGLIGIKI^AVPVVILIASVGIGTO 

MPTC LALMTVELFGMMGLIGIKI^AVPWILIASVGIGVXFTVHV 

PTC VLAS LAQI FGAMTLXiGI KLSAI PAVI LI LS VGMMLCFNVLI SLGFMTSVGNRQRRVQLSM 

BPTC iJVTLVLQLLGVMALLGVKI^AMPPVLLVLAIGRGVTiFTV^ 

* * * * **** * *** * * * * *_# * . * . . 

HPTC EHMFAFVLDGAVSTLLGVU4LAGSE FDFIVRYFFAVLAILTI LGVLNGLVLLPVLLS FFG 

MPTC EHMFAPVIJ}GAVSTLLGVLMIAGSEFDFIVRYFFAVIAILTVL 

PTC QMSLGPLVHGMLTSGVAVFMLSTSPFEFVIRHFCWLLLWLCVGACNSLLVFPILLSMVG 
BPTC ESVIAPVVHGAIAAALAASMIAASEFGFVARLFLRLLLALVFLGLITC 

.,*.„.* **. * * *. * * .* . .* •*..** * 

HPTC PYPEVSPANGLNRLPTPSPEPPPSWRFAMPPGHTHSGSDSSDSEYSSQTTVSGLSE-EL 
MPTC PCPEVSPANGLNRLPTPSPEPPPSVVRFAVPPGHTWGSDSSDSEYSSQTTVSGISE-EL 
PTC PEAELVPLEHPDRI ST PS PLPVRS SKRSGKS YWQGSRSSRGSCQKSHHHHHKDLND PS L 

40 BPTC PAAEVRPIEHPERLSTPSPKCSPIHPRKSSSSSGGGDKSSRTS-- KSAPRPC APSL 

* * * * **** * * * * 



30 



35 



HPTC RHYEAQQGAGGPAHQVTVEATENPVFAHSTWHPE SRHHPPSNPRQQ PHLDSGS L P PGRQ 

MPTC RQYEAQQGAGGPAHQVIVEATENFVFARSTVVHPDSRHQPPLTPRQQPHLDSGSLSPGRQ 

45 PTC TTITEEPQSWKSSNSSIQMPNDWTYQPREQ— RPASYAAPPPAYHKAAAQQHHQHQGPPT 

BPTC TTITEEPSSWHSSAHSVQSSMQSIWQPEVVVETTTYNGSDSASGRSTPTKSSHGGAITT 



HPTC 
50 MPTC 
PTC 
BPTC 



GQQPRRDPPREGLWPPLYRPRRDAFEISTEGHSGPSNRARWGPRGARSHNPRNPASTAMG 
GQQPRRDPPREGLRPPPYRPRRnAFEISTEGHSGPSNRDRSGPRGARSHNPRNPTSTAMG 

TPPPPFPTA : Y PPELQS I WQPEVTVETTHS DS 

TKVTATANIKVEVVTPSDRKSRRSYHYYDRRRDRDEDRDRDRERDRDRDRDRDRDRDRDR 



55 HPTC SSVPGYCQPITTVTASASVTVAVHPPFVPGPGRNPRGGLCPGY PE TDHGLFEDPHVP 

MPTC SSVPSYCQPITTVTASASVTVAVHPP — PGPGRNPRGGPCPGYESYPETDHGVFEDPHVP 

PTC NT TKVTATANI KVE LAMP GRAVRS YNFTS 

BPTC DR DRERSRERDRRDRYRD ERDHRA SPREKGRDSGHE 



60 



65 



HPTC 
MPTC 
PTC 
BPTC 



FHVRCERRDSKVEVIELQDVECEERPRGSSSN 
FHVRCERRDSKVEVIELQDVECEERPWGSSSN 



-SDSSRH 
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Tte identity of to. other clones recovered from the mouse library is not 
denned. These cDNAs cross-bybridi* with mouse p,c sequence, while dtffenng 
as to their restriction maps. These genes encode a family of proteins related to the 
5 ^wp* Alignment of the human and mouse nucleotide sequences, wtach 
mcludes coding and noncoding sequence, reveals 89* identity. 

,„ accordance with the subject invention, mammalian pm** genes, including 
* mouse and human genes, are provided which allow for high level production of 

1. the ^ protein, which can serve many purposes. The^cW protein may be 
used in a screening for agonists and antagonists, for isolation of is Ife*. 
parucularly «>*>. — V*-* ionic ft**,, and for assaying for the 
transcription of the mRNA pre. The protein or fragments thereof may be used * 
produce antibodies specific for the protein or specific epitopes of the protem. In 

15 addition, the gene may be employed for investigating embryonic deve.opment, by 
screening Ml tissue, preparing transgenic animals to serve as models, and the hke. 

All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application w«e 
20 specifically and individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
Ulustration and example for purposes of clarity of understanding, it will be readuy 
apparent to those of ordinary «■ in the art in light of the teachings of mis inventton 
25 mat ceriain changes and modifications may be made thereto without depamng from 
me spirit or scope of me appended claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

''m^ST' **" ° F TRUSTEES ° F ™ "«» STANFORD JUNIOR 

(ii) TITLE OF INVENTION: Patched Genes and their u.e 
(iii) NUMBER OF SEQUENCES J 19 
(iv) CORRESPONDENCE ADDRESS: 

Inl ESS"! 1 P1#hr ' HOhbaCh ' Te8t ' Litton & Herbert 

(B) STREET: Four Embarcadero Center, suite 3400 

(C) CITY: San Francieco 

(D) STATE: CA 

(E) COUNTRY: US 

(F) ZIP: 94111 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US95 / 

(B) FILING DATE: 06-OCT-1995 

(C) CLASSIFICATION: 



(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Rowland, Bertram I 

(B) REGISTRATION NUMBER: 20015 

(C) REFERENCE/ DOCKET NUMBER: a60190-l 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 415*781-1989 

(B) TELEFAX: 415-398-3249 



12) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 736 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA ( genomic ) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
AACNNCNNTN NATCCCACCC CCNCCCAACC TTTNNNCCNN NTAANCAAAA NNCCCCNTTT 
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NATACCCCCT NTAANANTTT TCCACCNNNC NNAAANNCCN CTGNANACNA NGNAAANCCN 
TTTTTNAACC CCCCCCACCC GOAATTCCNA NTNNCCNCCC CCAAATTACA ACTCCAGNCC 
AAAATTNANA NAATTGGTCC TAACCTAACC NATNGTTGTT ACGGTTTCCC CCCCCAAATA 
CATGCACTGG CCCGAACACT TGATCGTTGC CGTTCCAATA ACAATAAATC TGCTCATATT 
AAACAAGCCN AAAGCTTTAC AAACTGTTGT ACAATTAATG GGCGAACACG AACTGTTCGA 
ATTCTGGTCT GGACATTACA AAGTG C ACC A CATCGGATGG AACCAGGAGA AGGCCACAAC 
CGTACTGAAC GCCTGGCAGA AGAAGTTCGC ACAGGTTGGT GGTTGGCGCA AGGAGTAGAG 
TGAATGGTGG TAATTTTTGG TTGTTCCAGG AGGTGGATCG TCTGACGAAG AGCAAGAAGT 
CGTCGAATTA CATCTTCGTG ACGTTCTCCA CCGCCAATTT GAACAAGATG TTGAAGGAGG 
CGTCGAANAC GGACGTGGTG AAGCTGGGGG TGGTGCTGGG GGTGGCGGCG GTGTACGGGT 
GGGTGGCCCA GTCGGGGCTC GCTGCCTTGG GAGTGCTGGT CTTNGCGNGC TNCNATTCGC 
CCTATAGTNA GNCGTA 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 107 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Xaa Pro Pro Pro Asn Tyr Asn Ser Xaa Pro Lys Xaa Xaa Xaa Leu Val 
1 5 10 

Leu Thr Pro Xaa Val Val Thr Val Ser Pro Pro Lys Tyr Met His Trp 



120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
736 
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25 



Pro Glu His Leu 
35 



He Val Ala Val Pro He Arg He Asn Leu Val He 
40 45 

Leu Asn Lys Pro Lys Ala Leu Gin Thr Val Val Gin Leu Met Gly Glu 
50 55 60 

His Glu Leu Phe Glu Phe Trp Ser Gly His Tyr Lys Val His His lie 
65 70 75 

Gly Trp Asn Gin Glu Lys Ala Thr Thr Val Leu Asn Ala Trp Gin Lys 
85 ^0 * 

Lys Phe Ala Gin Val Gly Gly Trp Arg Lys Glu 
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(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5187 base pairs 

(B) type j nucleic acid 

(C) STRANDEDNESS : single 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPES CDNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 3: 
CGGTCTGTCA CCCGOAOCCC GAGTCCCCGG CGCCCAGCAG CGTCCTCGCG AGCCGAGCGC 
CCAGGCGCGC CCGGAGCCCG CGGCGGCCGC GGCAACATGG CCTCGGCTGG TAACGCCGCC 
GGGGCCCTGG GCAGGCAGGC CGCCGGCGGC ACGCGCACAC GGACCGGGGG ACCGCACCGC 
CCCGCGCCGG ACCGGGACTA TCTGCACCGG CCCAGCTACT GCGACGCCGC CTTCGCTCTG 
GAGCAGATTT CCAAGGGGAA GGCTACTGGC CGGAAAGCGC CGCTGTGGCT GAGAGCGAAG 
TTTCAGAGAC TCTTATTTAA ACTGGGTTGT TACATTCAAA AGAACTGCGG CAAGTTTTTC 
GTTGTGGGTC TCCTCATATT TGGGCCCTTC GCTGTGGGAT TAAAGGCAGC TAATCTCGAG 
ACCAACGTGG AGGAGCTGTG GGTGGAAGTT GGTGGACGAG TGAGTCCAGA ATTAAATTAT 
ACCCGTCAGA AGATAGGAGA AGAGGCTATG TTTAATCCTC AACTCATGAT ACAGACTCCA 
AAAGAAGAAG GCGCTAATCT TCTGACCACA GAGGCTCTCC TGCAACACCT GGACTCAGCA 
CTCCAGGCCA GTCGTGTGCA CGTCTACATG TATAACAGGC AATGGAAGTT GGAACATTTG 
TGCTACAAAT CAGGGGAACT TATCACGGAG ACAGGTTACA TGGATCAGAT AATAGAATAC 
CTTTACCCTT GCTTAATCAT TACACCTTTG GACTGCTTCT GGGAAGGGGC AAAGCTACAG 
TCCGCGACAG CATACCTCCT AGGTAAGCCT CCTTTACGGT GGACAAACTT TGACCCCTTG 
GAATTCCTAG AAGAGTTAAA GAAAATAAAC TACCAAGTGG ACAGCTGGGA GGAAATGCTG 
AATAAAGCCG AAGTTGGCCA TGGGTACATG GACCCCCCTT GCCTCAACCC AGCCGACCCA 
GATTGCCCTG CCACAGCCCC TAACAAAAAT TCAACCAAAC CTCTTGATGT GGCCCTTGTT 
TTGAATGGTG GATGTCAAGG TTTATCCAGG AAGTATATGC ATTGGCAGGA GGAGTTGATT 
GTGGGTGGTA CCGTCAAGAA TGCCACTGCA AAACTTGTCA GCGCTCACGC CCTGCAAACC 
ATGTTCCACT TAATGACTCC CAAGCAAATG TATGAACACT TCAGGGGCTA CGACTATGTC 
TCTCACATCA ACTGGAATGA ACACAGGGCA GCCGCCATCC TGGAGGCCTG GCAGAGGACT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
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TACGTGGAGG 
ACAACCACGA 
GCCAGCGGCT 
TCCAAGTCCC 
GCAGGATTGG 
TTGCCGTTTC 
AGTGAAACAG 
CGCACCGGAG 
GCATTGATCC 
TTCAATTTTG 
CGTGAGGACA 
ATTCAAGTTG 
CCCCCATACA 
CAGCTCCGCA 
TCTGAGATCT 
GAGAGCACCA 
CTCGAGCCCC 
TTCCTCCTGA 
GTCAGCCTTT 
CGGGAAACCA 
ATGTATATAG 
CATAAGAGTT 
ATGTGGCTGC 
TGGGAAACTG 
GCTTACAAAC 
ACTAAACAGC 
CTGACCGCTT 
CCTCACCGGC 
ATCCCAGCAG 



TGGTTCATCA AAGTGTCGCC CCAAACTCCA 
CCCTGGACGA CATCCTAAAA TCCTTCTCTG 
ACCTACTGAT GCTTGCCTAT GCCTGTTTAA 
AGGGTGCCGT GGGGCTGGCT GGCGTCCTGT 
GCCTCTGCTC CTTGATTGGC ATTTCTTTTA 
TTGCTCTTGG TGTTGGTGTG GATGATGTCT 
GACAGAATAA GAGGATTCCA TTTGAGGACA 
CCAGCGTGGC CCTCACCTCC ATCAGCAATG 
CTATCCCTGC CCTGCGAGCG TTCTCCCTCC 
CTATGGTTCT GCTCATTTTT CCTGCAATTC 
GAAGATTGGA TATTTTCTGC TGTTTCACAA 
AGCCACAGGC CTACACAGAG CCTCACAGTA 
CCAGCCACAG CTTCGCCCAC GAAACCCATA 
CAGAGTATGA CCCTCACACG CACGTGTACT 
CTGTACAGCC TGTTACCGTC ACCCAGGACA 
GCTCTACCAG GGACCTGCTC TCCCAGTTCT 
CCTGCACCAA GTGGACACTC TCTTCGTTTG 
AACCCAAAGC CAAGGTTGTG GTAATCCTTC 
ATGGGACCAC CCGAGTGAGA GACGGGCTGG 
GAGAATATGA CTTCATAGCT GCCCAGTTCA 
TCACCCAGAA AGCAGACTAC CCGAATATCC 
TCAGCAATGT GAAGTATGTC ATGCTGGAGG 
ACTACTTTAG AGACTGGCTT CAAGGACTTC 
GGAGGATCAT GCCAAACAAT TATAAAAATG 
TCCTGGTGCA GACTGGCAGC CGAGACAAGC 
GTCTGGTAGA CGCAGATGGC ATCATTAATC 
GGGTCAGCAA CGACCCTGTA GCTTACGCTG 
CGGAGTGGGT CCATGACAAA GCCGACTACA 
CAGAGCCCAT CGAGTACGCT CAGTTCCCTT 



CTCAAAAGGT 
ATGTCAGTGT 
CCATGCTGCG 
TGGTTGCGCT 
ATGCTGCGAC 
TCCTCCTGGC 
GGACTGGGGA 
TCACCGCCTT 
AGGCTGCTGT 
TCAGCATGGA 
GCCCCTGTGT 
ACACCCGGTA 
TCACTATGCA 
ACACCACCGC 
ACCTCAGCTG 
CAGACTCCAG 
CAGAGAAGCA 
TTTTCCTGGG 
ACCTCACGGA 
AGTACTTCTC 
AGCACCTACT 
AGAACAAGCA 
AGGATGCATT 
GATCAGATGA 
CCATCGACAT 
CGAGCGCTTT 
CCTCCCAGGC 
TGCCAGAGAC 
TCTACCTCAA 



GCTTCCCTTC 
CATCCGAGTG 
CTGGGACTGC 
GTCAGTGGCT 
AACTCAGGTT 
CCATGCATTC 
GTGCCTCAAG 
CTTCATGGCC 
GGTGGTGGTA 
TTTATACAGA 
CAGCAGGGTG 
CAGCCCCCCA 
GTCCACCGTT 
CGAGCCACGC 
TCAGAGTCCC 
CCTCCACTGC 
CTATGCTCCT 
CTTGCTGGGG 
CATTGTTCCC 
TTTCTACAAC 
TTACGACCTT 
ACTTCCCCAA 
TGACAGTGAC 
CGGGGTCCTC 
TAGTCAGTTG 
CTACATCTAC 
CAACATCCGG 
CAGGCTGAGA 
CGGCCTACGA 



1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2S20 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 
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GACACCTCAG ACTTTOTOOA 
AGCCTGGGAC TGTCCAGCTA 
AGCCTGCGCC ACTGOCTGCT 
TGCGCAGTCT TCCTCCTGAA 
ATGACCGTTG AGCTCTTTGO 
GTGGTCATCC TGATTCCATC 
GCCTTTCTGA CAGCCATTGO 
TTTGCTCCCG TTCTGGACGG 
TCCGAATTTG ATTTCATTGT 
GGGGTTCTCA ATGGACTGGT 
GAGGTGTCTC CAGCCAATGG 
AGTGTCGTCC GGTTTGCCGT 
TCGGAGTACA GCTCTCAGAC 
GCACAGCAGG GTGCCGGAGG 
GTCTTTGCCC GGTCCACTGT 
CGGCAACAGC CCCACCTGGA 
CGAAGGGATC CCCCTAGAGA 
TTTGAAATTT CTACTGAAGG 
GGGGCCCGTT CTCACAACCC 
AGCTACTGCC AGCCCATCAC 
CCCCCGCCTG GACCTGGGCG 
CCTGAGACTG ATCACGGGGT 
AGGAGGGACT CAAAGGTGGA 
TGGGGGAGCA GCTCCAACTG 
AAGCCCCGCC CCCACCTCTT 
GGCAGTTCAT TGTTACTGTA 
AARAGGTGTA CACATGTAAT 
CCACTCCTGC CCCAGAGTGC 
TGTCCCACAA CCAAGCTTAA 



PCI7US9S/13233 

AGCCATAGAA AAAGTGAGAG TCATCTGTAA CAACTATACG 3060 

CCCCAATGGC TACCCCTTCC TGTTCTGGGA GCAATACATC 3120 

GCTATCCATC AGCGTGGTGC TGGCCTCCAC GTTTCTAGTG 3180 

CCCCTGGACG GCCGGGATCA TTGTCATGGT CCTGGCTCTG 3240 

CATGATGGGC CTCATTGGGA TCAAGCTGAG TGCTGTGCCT 3300 

TGTTGCCATC GG AG TGGAGT TCACCGTCCA CGTGGCTTTG 3360 

GGACAAGAAC CACAGGGCTA TGCTCGCTCT GGAACACATG 3420 

TGCTGTGTCC ACTCTGCTGG GTGTACTGAT GCTTGCAGGG 3480 

CAGATACTTC TTTGCCGTCC TGGCCATTCT CACCGTCTTG 3540 

TCTGCTGCCT GTCCTCTTAT CCTTCTTTGG ACCGTGTCCT 3600 

CCTAAACCGA CTGCCCACTC CTTCGCCTGA GCCGCCTCCA 3660 

GCCTCCTGGT CACACGAACA ATGGGTCTGA TTCCTCCGAC 3720 

CACGGTGTCT GGCATCAGTG AGGAGCTCAG GCAATACGAA 3780 

CCCTGCCCAC CAAGTGATTG TGGAAGCCAC AGAAAACCCT 3840 

GGTCCATCCG GACTCCAGAC ATCAGCCTCC CTTGACCCCT 3900 

CTCTGGCTCC TTGTCCCCTG GACGGCAAGG CCAGCAGCCT 3960 

AGGCTTGCGG CCACCCCCCT ACAGACOGCG CAGAGACGCT 4020 

GCATTCTGGC CCTAGCAATA GGGACCGCTC AGGGCCCCGT 4080 

TCGGAACCCA ACCTCCACCG CCATGGGCAG CTCTGTGCCC 4140 

CACTGTGACG GCTTCTCCTT CGGTGACTGT TGCTGTGCAT 4200 

CAACCCCCGA GGGGGG CCCT CTCCACGCTA TGAGAGCTAC 4260 

ATTTGAGGAT CCTCATGTGC CTTTTCATGT CAGGTGTGAG 4320 

GGTCATAGAG CTACAGGACG TGGAATGTGA GGAGAGGCCG 4380 

AGGGTAATTA AAATCTGAAG CAAAGAGCCC AAAGATTGGA 4440 

TCCAGAACTG CTTGAAGAGA ACTGCTTGGA ATTATGGGAA 4500 

ACTGATTGTA TTATTKKGTG AAATATTTCT ATAAATATTT 4560 

ATACATGGAA ATGCTCTACA GTCTATTTCC TGGGGCCTCT 4620 

GGAGACCACA GGGGCCCTTT CCCCTGTGTA CATTGGTCTC 4680 

CTTAGTTTTA AAAAAAATCT CCCAGCATAT GTCGCTGCTG 4740 
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CTTAAATATT GTATAATTTA CTTGTATAAT TCTATCCAAA TATTGCTTAT GTAATAGGAT 
TATTTGTAAA GGTTTCTGTT TAAAATATTT TAAATTTGCA TATCACAACC CTGTGGTAGG 
ATGAATTGTT ACTGTTAACT TTTGAACACG CTATGCGTGG TAATTGTTTA ACGAGCAGAC 
ATGAAGAAAA CACCTTAATC CCAGTGGCTT CTCTAGGGGT AGTTGTATAT GGTTCGCATG 

gotocatotg kwkcato T»»cmcc» ATOT.CTOTA ttotoottto ttcttcttm 

TGCTGTTGTT GTTCATTTTG GTGTTTTTGG TTGCTTTGTA TGATCTTAGC TCTGGCCTAG 
GTGGGCTGGG AAGGTCCAGG TCTTTTTCTG TCGTGATGCT GGTGGAAAGG TGACCCCAAT 
CATCTGTCCT ATTCTCTGGG ACTATTC 
(2) INFORMATION FOR SEQ ID NOi4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1311 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



4800 

4860 

4920 

4980 

5040 

5100 

5160 

5187 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Met Val Ala Pro Asp Ser Glu Ala Pro Ser Asn Pro ~* — ~ 

i 



5 10 



Ser Asn Pro Arg He Thr Ala 
10 15 

Ala Arg His Ser Ala Asp Leu 
30 

„ r a. «. Thr S.r «, V.! Up *U 1- «• -» "» — 

ol „ L y . «, »•» «• «■ «r «T *. ~ - « «» "* »" "* 

so 55 

„ ^ «. «. .1. - «- « 51- «- - 01 » y 

65 70 

«, Lyl V.! I* Ph. V.! ». U. LJ. V.1 U. ~ «- » C 

85 90 
val Gly Leu Lye ser Ala Gin He Hie Thr Arg Val Ae P Gin Leu Trp 

100 105 
val Gin Glu Gly Gly Arg Le« Glu Ala Glu Leu Lye Tyr Thr Ala Gin 



Ala Hie Glu S.r Pro Cye *U Thr Glu Ala Arg ^ 
20 25 



115 



120 



.„ «. »!• »U> W. s.r Thr Hi. 01. W. «1 n. «. Thr 



Ala Leu Gly oxu »-f — 14Q 

130 135 
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jjj w. -P v«i „ t _ ^ H1 . oly aii ^ ^ 

155 •» 4- ^ 

„. 160 

- u. v., v., Bi . A1 , , u thr vil rar hu ^ ^ 

175 



.u «. jp ^ ^ ly . ,. p ^ iyc s-r ^ 

185 190 

*.p *» «. „ y ^ M1 . 81 . cli , s . r n# n> >>p 

^ UU 205 

- g. A1 . lto n . Iht _ ,. p ^ pta itp oio 

220 

s - «, p~ A . P s P „ u . Iyc Vil pro hi> ^ Ly> 8i< ^ 

235 240 

- «. * jj. _ A .„ Pro L . o „, vii v>i oiu oiu ^ ^ 

u. J. „„ n . Pro tm Tht m oiu ^ ^ 2 Ly . 

265 270 
M. jj, „. Thr S . t A1 . s; Ly . Ly . p „ cy> ^ ^ ^ 

235 

Thr Asp Pro His Cys Pro Ala Thr- m „ 

290 ° JJJ Thr Ala Pr ° Asn Lys Lye S.r Gly His 

300 

JJJ - Mp v., „. jj. clu _ ^ Hi . My ^ ^ 

315 320 
«• M. ^ „ , „ ^ „. cl „ ^ 

330 335 
- «r ». S . t A1 . ^ >u Arg M ^ 

345 350 
V! «„ Jj. H« „„ 010 ^ ^ ^ Mu Iyc ^ 

360 365 
Tyr Lys Val His Gin lie G i y Tr _ Aan „ „, 

370 "J Trp A8n G l" Glu Lys Ala Ala Ala Val 

' 3 380 

a «-p «. », „„ s ly . ^ A1 . A1 . ol „ m ^ Ly< tii ihr 

395 400 
- - «y ~ jjj s. t s „ Ala ^ , P6 . ^ pro p „. 

410 415 

i« ». - «» ».p a. x- „ r g. Ph . s . c 01 „ v>1 5>r ^ 

425 430 
A.n lie XI. ^ Gly ^ ^ ^ # ^ ^ ^ ^ ^ ^ ^ 

Leu lie Gin Trp Arg Asp Pro zl . 



440 — — »r 



Arg Ser Gin Ala Gly Val Gly n. 
31 
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«0 

, lT V.! L« « «~ »r .» ~ V. t Jl. »• «, - -IT "J 

465 470 

■ c y . u. L.U L.u «T "e P~ ». « £ - «* »■ K ^ 

485 



M u «. l- «, ~ «, v.. «- m. »t m - *« 

500 505 
Thr ^ Val Glu Cln Ala Gly Asp Val Pro Arg Glu Glu Arg ~ 



Gly Le u Val Leu Lys Lys Ser Gly - Ser Val Leu - Ala Ser Leu 



530 536 



Val Met Ala Ph. L.u Ala Ala Ala Leu Leu Pro He Pro Ala 
545 550 

x\m Ala He Leu Leu Leu Phe Asn Leu 
Phe Arg Val Phe Cys Leu Gin Ala Ala lie g75 
565 

01y s.r n. l.u « v.. P.. « «. "« •» ~ « « - " 9 

580 585 
„ M s.r u. - «. »• « - - c,. c y . «. « P~ «u 

595 bUU 

s « „ « «, l,. Ly. S »• « ^ « & iy * 

610 615 

« MP „. T« «. « ~ ™ P " *" S 

625 630 

Mp v.! s.r <Hu »» V.1 Thr L y . Thr c,. «. L.u S.r S.r L.u 

645 bSU 
* Ly. Trp ». Ly. «« «. *yr JU Pro PH. U. H.t Ar, Pro U. 

660 665 

v., Ly. v,! • — - - »• "* v " ii; " u Ihr s " 

675 680 



v., w «y M. Thr Ly. v.x Ly. »p 0l y - -J - ~ " P 

690 595 
val Pro Glu Aen Thr Asp Glu His Glu Phe Leu Ser Arg Gin Glu Lye 



,05 "° 



Ph. «y Ph. Tyr »•» H« Tyr v.1 Thr ... «T ~ £ «- 

725 730 



m Pro Thr M «. Ly. - "u T»r «. Tyr Hi. «P «n Ph. v.. 

740 745 
^ xl. Pro A.n XI. Xle Ly. A.n Asp A.n Gly Gly Leu Thr Lye Phe 



755 760 
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Trp L.u s.r Leu Phe Arg A.p T rp Leu Leu A.p l.u Gin V.l Al. Ph. 

775 780 

Asp Lye Glu Val Ala Ser Gly Cys He Thr Gin n„ - „ 
785 790 7 Gln Glu TV* Tr P c y fl Lys 

795 800 

A.n A!, S.r A.p Glu Gly He Leu Al. Tyr Ly. Leu Met V.l Gin Thr 

805 815 
Gly Hi. V.1 A.p A.n Pro He A.p Ly. s. r Le„ lle Thr ^ Hia 
820 825 830 

Arg Leu V.l A.p Ly. A.p Gly lie II. A.n Pro Ly. Al. Phe Tyr A.n 
Tyr Leu Ser Al. Trp Al. Thr A.n Asp Ala Leu Al. Tyr Gly Al. Ser 



855 



860 



865 ^ L " U ^ 1% ° ln Pr ° ° ln *** ?5? Hi. ser Pro Glu 



875 



880 



A.p V.1 Hi. Leu Glu lie Ly. Ly. Ser Ser Pro Leu lie Tyr Thr Gin 

885 89S 
L.u Pro Ph. Tyr Leu Ser Gly Leu Ser A.p Thr X.a Ser Ile Ly. Thr 

905 910 



Leu II. Arg Ser V.l Arg A.p Leu Cy. Leu Ly. Tyr Glu Ala Ly. Gly 

920 925 

Leu Pro A.n Phe Pro Ser Gly II. Pro Phe Leu Phe Trp Glu Gin Tyr 

935 940 

945 ^ f~ L * U "~ L * U Al. Cy. Al. Leu Al. 

950 960 

Al. V.1 Ph. 11. Al. V.1 Met V.1 Leu Leu Leu A.n Al. Trp Al. Al. 

965 97S 

V.1 Leu Val Thr Leu Al. Leu Al. Thr Leu Val L.u Gin Leu Leu Gly 

985 990 

V.1 Met Al. Leu Leu Gly Val Ly. Leu ser Al. Met Pro Ala V.l Leu 

5 10 °0 1005 

Leu V.1 Leu Ala II. Gly ^ Gly Val ^ phe ^ ^ ^ ^ ^ 

1015 1020 
Leu Gly Phe Val Thr Ser He Gly Cya Ly. Arg Arg Arg Al. Ser Leu 

1030 "35 1040 

Al. Leu Glu Ser V.l u» Al. Pro V.l v.l His Gly Ala Leu Al. Al. 

1045 1050 1055 

Al. Leu Al. Al. Q S.r Met L.u Al. Al. Ser Glu Cy. Gly Ph. V.l Al. 

1060 1065 1070 

Arg Leu Ph. Leu Arg Leu Leu Leu A.p 11. v.l Ph. Leu Gly Leu II. 
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loan 

1075 1080 



« P cly ~u l.u nm n. Pjo n. v.i u. s„ u. «. «, «. «• 

1090 1095 

Ala Clu Val Arg Pro II, Glu His Pro Gl« Arg Leu ser Thr Pro £ 

1105 "10 1115 

»..«, ti» his Pro Ara Lya Ser Ser Ser Ser Ser Oly 
Pro Lya Cys Ser Pro lie His Pro Arg jye^ ii3fi 



1125 



01y M, »P •« s.r T„r J- W — »• « <*' 

1140 1145 
Al a Pro ser Leu Thr Thr He Thr Glu Glu Pro Ser £r Trp Hia Ser 

1155 1160 
ser Ala Hia ser Val Gin Ser Ser Met Gin Ser X1.V.1 Val Gin Pro 



1170 



1175 



Glu V.I val Val Glu Thr Thr Thr Tyr Aan Gly Ser Asp Ser Ala Sjr 
1185 I" 0 1195 

Gly Arg ser Thr Pro Thr Lys ser ser Hia Gly Gly Ala He thr Thr 

1205 1210 

Thr Ly. »1 Thr u. *ur «• »> ~ «» ~ ™ 

1220 1225 

ser Asp Arg Lya Ser Arg Arg Ser Tyr His Tyr Tyr Asp Arg Arg Arg 

1235 1240 " 

A8P Arg Asp Glu Asp Arg Asp Arg Asp Arg Glu Arg Asp Arg Asp Arg 
1250 1255 



Asp Arg Asp Arg Aap Arg Aap Arg Asp Arg teg Mp Arg Asp Ar^ 

1265 1270 

Glu Arg ser Arg Glu Arg Aap Arg Arg As^Arg Tyr Arg A. P Glu^Arg 



1285 



A sp His Arg Ala Ser Pro Arg Glu Lye Arg Gin Arg Phe TrpThr 
1300 1305 

2) INFORMATION FOR SEQ ID NO: 5: 



(2) 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4434 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : e ingle 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 



xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

34 
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CGAAACAAGA GAGCGAGTGA 
ACGCACACAG GCGCAAAACA 
AGAAACAGCG GOGCGCGCTC 
CGAGATACAG ATACATCTCT 
GGCGATGTGG TCGAtGAGAA 
GCCCAAGTGG CGCTCGATCA 
TATCTGCGAT CAGTATTCCA 
GCGGGCAAGG TGCTATTCGT 
AGCGCCCAGA TCCACTCCAA 
GCGGAACTGG CCTACACACA 
CTCATTCAGA CGACCCACGA 
CACCTGGAGG TCCTGGTCAA 
GGGCTGCGCG ACATGTGCAA 
GAGCAGATCC TGCGCCACCT 
GAGGGAAGCC AGCTGTTGGG 
CTCCTGTGGA CCACCCTGAA 
GAGGAAAAGA TCAGCTTCGA 
GGCAGTGGCT ACATGGAGAA 
GCACCGAACA AGAACAGCAC 
TACGGTTATG CCGCGAAGCA 
AGGAACCGCA GCGGACACTT 
ACCGAGAAGG AAATGTACGA 
ACGCAGGAGA AGGCAGCGGA 
GAACAGCTGC TACGTAAACA 
TCGGCTGCAC TGGATGACAT 
ATCGGCGXGG CCGTCACCGT 
GTCCGTGGCC AGAGCAGTGT 
GCCGGATTGG GATTGTCAGC 
GCGGAGAGCA ATCGGCCGGA 
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GAGTAGGGAG AGCGTCTGTG TTGTGTGTTG AGTGTCGCCC 60 

GTGCACACAG ACGCCCGCTG GGCAAGAGAG AGTGAGAGAG 120 

GCCTAATGAA GTTGTTGGCC TGGCTGGCGT GCCGCATCCA 180 

CATGGACCGC GACAGCCTCC CACGCGTTCC GGACACACAC 240 

ATTATTCTCG GATCTTTACA TACGCACCAG CTGCGTGGAC 300 

GATAGATAAG GGCAAAGCGC GTGGCAGCCG CACGGOGATC 360 

GTCCCACCTC GAAACCCTOG GCAGCTCOGT GCAAAAGCAC 420 

GGCTATCCTG GTGCTGAGCA CCTTCTGCGT CGGCCTCAAG 480 

GGTGCACCAG CTCTGGATCC AGGAGGGCGG CCGGCTGGAG 540 

GAAGACGATC GGCGAGGACG AGTCGGCCAC GCATCAGCTG 600 

CCCGAACGCC TCCGTCCTGC ATCCGCAGGC GCTGCTTGCC 660 

GCCCACCGCC GTCAAGGTGC ACCTCTACGA CACCGAATGG 720 

CATGCCGAGC AOGCCCTCCT TCGAGGGCAT CTACTACATC 780 

CATTCOGTGC TCGATCATCA CGCCGCTGGA CTCTTTCTCG 840 

TCCGGAATCA GCGGTCGTTA TACCAGGCCT CAACCAACGA 900 

TCCCGCCTCT GTGATGCAGT ATATGAAACA AAAGATGTCC 960 

CTTCGAGACC GTGGAGCAGT ACATGAAGCG TGCGGCCATT 1020 

GCCCTGCCTG AACCCACTGA ATCCCAATTG CCCGGACACG 1080 

CCAGCCGCCG GATGTGCCAG CCATCCTGTC CGGAGGCTGC 1140 

CATGCACTGG CCGGAGGAGC TGATTGTGGG CGGACGGAAG 1200 

GAGGAAGGCC CAGGCCCTGC AGTCGGTGGT GCACCTGATG 1260 

CCAGTGGCAG GACAACTACA AGGTGCACCA TCTTGGATGG 1320 

GGTTTTGAAC GCCTGGCAGC GCAACTTTTC GOGGGAGGTG 1380 

GTCGAGAATT GCCACCAACT ACGATATCTA CGTGTTCAGC 1440 

CCTGGCCAAG TTCTCCCATC CCAGCGCCTT GTCCATTGTC 1500 

TTTGTATGCC TTTTGCACGC TCCTCCGCTG GAGGGACCCC 1560 

GGGCGTGGCC GGAGTTCTGC TCATGTGCTT CAGTACCGCC 1620 

CCTGCTCGGT ATCGTTTTCA ATGCGCTGAC CGCTGCCTAT 1680 

GCAGACCAAG CTGATTCTCA AGAACGCCAG CACCCAGGTG 1740 
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GTTCCGTTTT TGGCCCTTGG TCTGGOCCTC 
CTGTTCAGTG CCTGCAGCAC CGCAGGATCC 
GCTTTGAAGG TATTCTGTCT GCAGGCTGCC 
CTATTGGTTT TTCCGCCCAT GATTTCGTTG 
GACATCTTCT GCTGCTGTTT TCCGGTGTGG 
CTCCCGCTGA ACAACAACAA CGGGCGCGGG 
AGGGTGCCGC TGCCCGCCCA GAATCCTCTG 
AGTCACTCAC TGGCGTCCTT CTCCCTGGCA 
CTCATGCGCA GCTGGGTGAA GTTCCTGACC 
AGCTTGTATG CCTCCACGCG CCTTCAGGAT 
CACAGCAACG AGCACAAGTT CCTGGATGCT 
TATGCGGTTA CCCAGGGCAA CTTTGAATAT 
CATGATTCCT TTGTGOGGGT GCCACATGTG 
TTCTGGCTGC TGCTCTTCAG CGAGTGGCTG 
TACCGCGACG GACGGCTGAC GAAGGAGTGC 
CTGGCCTACA AGCTAATCGT GCAAACCGCC 
GTGCTCACCA ATCGCCTGCT CAACAGCGAT 
TATCTGTCGG CATGGGCCAC CAACGACGTC 
TATCCGGAAC CGCGCCAGTA TTTTCACCAA 
AGTCTGCCAT TGGTCTACGC TCAGATGCCC 
CAGATCAAGA CCCTGATAGG TCATATTCGC 
CTGCCCAACT ATCCATCGGG CATTCCCTTC 
TCCTCACTGG CCATGATCCT GGCCTGCGTG 
CTCCTGCTCT CCGTTTGGGC CGCCGTTCTC 
CAGATCTTTG GGGCCATGAC TCTGCTGGGC 
CTCATCCTCA GCGTGGGCAT GATGCTGTGC 
ACATCCGTTG GCAACCGACA GCGCCGCGTC 
CTTGTCCACG GCATGCTGAC CTCCGGAGTG 
GAGTTTGTGA TCCGGCACTT CTGCTGOCTT 
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GATCACATCT TCATAGTGGG ACCGAGCATC 1800 

TTCTT TGCGG CCGCCTTTAT TCCGGTGCCG 1860 

ATCGTAATGT GCTCCAATTT GGCAGCGGCT 1920 

GATCTACGCA GACGTACCGC CGGCAGGGCG 1980 

AAGGAACAGC CGAAGGTGGC ACCTCCGGTG 2040 

GCCCGGCATC CGAAGAGCTG CAACAACAAC 2100 

CTGGAACAGA GGGCAGACAT CCCTGGGAGC 2160 

ACCTTCGCCT TTCAGCACTA CACTCCCTTC 2220 

GTTATGGGTT TCCTGGCGGC CCTCATATCC 2280 

GGCCTGGACA TTATTGATCT GGTGCCCAAG 2340 

CAAACTCGGC TCTTTGGCTT CTACAGCATG 2400 

CCCACCCAGC AGCAGTTGCT CAGGGACTAC 2460 

ATCAAGAATG ATAACGGTGG ACTGCCGGAC 2520 

GGTAATCTGC AAAAGATATT CGACGAGGAA 2580 

TGGTTCCCAA ACCCCAGCAG CGATGCCATC 2640 

CATGTGGACA ACCCCGTGGA CAAGGAACTG 2700 

GGCATCATCA ACCAACGCGC CTTCTACAAC 2760 

TTCGCCTACG GAGCTTCTCA GGGCAAATTG 2820 

CCCAACGAGT ACGATCTTAA GATACCCAAG 2880 

TTTTACCTCC ACGGACTAAC AGATACCTCG 2940 

GACCTGAGCG TCAAGTACGA GGGCTTCGGC 3000 

ATCTTCTGGG AGCAGTACAT GACCCTGCGC 3060 

CTACTCGCCG CCCTGGTGCT GGTCTCCCTG 3120 

GTGATCCTCA GCGTTCTGGC CTCGCTGGCC 3180 

ATCAAACTCT CGGCCATTCC GGCAGTCATA 3240 

TTCAATGTGC TGATATCACT GGGCTTCATG 3300 

CAGCTGAGCA TGCAGATGTC CCTGGGACCA 3360 

GCCGTGTTCA TGCTCTCCAC GTCGCCCTTT 3420 

CTGCTGGTGG TCTTATGCGT TGGCGCCTGC 3480 
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AACAGCCTTT TGCTCTTCCC CATCCTACTG AGCATGCTGG GACCGGAGGC GGAGCTGGTG 
CCGCTGGAGC ATCCAGACCG CATATCCACG CCCTCTCCGC TGCCCGTCCG CACCAGCAAG 
AGATCGGGCA AATCCTATGT GGTGCAGGGA TCGCGATCCT CGCGAGGCAG CTGCCAGAAG 
TCGCATCACC ACCACCACAA AGACCTTAAT GATCCATCCC TGACGACGAT CACCGAGGAG 
COGCAGTCGT GGAAGTCCAG CAACTCGTCC ATCCAGATGC CCAATGATTG GACCTACCAG 
CCGCGGGAAC AGCGACCCGC CTCCTACGCG CCCCCGCCCC CCGCCTATCA CAAGGCCGCC 
GCCCACCAGC ACCACCAGCA TCAGGGCCCG CCCACAACGC CCCCGCCTCC CTTCCCCACG 
GCCTATCCGC CGGAGCTCCA GAGCATCGTG GTGCAGCCGG AGGTGACGGT GGAGACGACG 
CACTCGGACA GCAACACCAC CAAGGTGACG GCCACGGCCA ACATCAAGGT GGAGCTGGCC 
ATCCCCGGCA GGGCGGTGCG CAGCTATAAC TTTACGAGTT AGCACTAGCA CTACTTCCTG 
TAGCTATTAG GACGTATCTT TACACTCTAG CCTAAGCCGT AACCCTATTT GTATCTGTAA 
AATCGATTTG TCCAGCGGGT CTGCTGAGGA TTTCGTTCTC ATGGATTCTC ATGGATTCTC 
ATGGATGCTT AAATGGCATG GTAATTGGCA AAATATCAAT TTTTGTGTCT CAAAAAGATC 
CATTAGCTTA TGGTTTCAAG ATACATTTTT AAAGAGTCCG CCAGATATTT ATATAAAAAA 
AATCCAAAAT CGACGTATCC ATGAAAATTG AAAAGCTAAG CAGACCCGTA TGTATGTATA 
TGTGTATGCA TGTTAGTTAA TTTCCCGAAG TCCGGTATTT ATAGCAGCTC CCTT 
(2) INFORMATION FOR SKQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1285 amino acids 

(B) OTPE: amino acid 

(C) STRAND ED NESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4434 



<xi> SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Asp Arg Asp Ser Leu Pro Arg Val Pro Asp Thr His Gly Asp Val 



10 xs 



Val Asp Glu Lys Leu Phe Ser Asp Leu Tyr He Arg Thr Ser Trp Val 
Asp Ala Gin Val Ala Leu Asp Gin He Asp Lys Gly Lys Ala Arg Gly 



45 



Ser Arg Thr Ala He Tyr Leu Arg Ser Val Phe Gin Ser His Leu Glu 



50 « 60 



37 



WO 96/11260 



PCT/US95/13233 



Thr Leu Gly Ser Ser Val Gin Lye His Ala Gly Lya Val Leu Phe Val 

65 70 75 80 

Ala He Leu Val Leu Ser Thr Phe Cys Val Gly Leu Lys Ser Ala Gin 
85 90 95 

He Hie Ser Lys Val His Gin Leu Trp He Gin Glu Gly Gly Arg Leu 
100 105 110 

Glu Ala Glu Leu Ala Tyr Thr Gin Lye Thr He Gly Glu Asp Glu Ser 
115 120 125 

Ala Thr His Gin Leu Leu He Gin Thr Thr His Asp Pro Asn Ala Ser 
130 135 140 

Val Leu His Pro Gin Ala Leu Leu Ala His Leu Glu Val Leu Val Lys 
145 150 155 160 

Ala Thr Ala Val Lys Val His Leu Tyr Asp Thr Glu Trp Gly Leu Arg 
165 170 175 

Asp Met Cys Asn Met Pro Ser Thr Pro Ser Phe Glu Gly He Tyr Tyr 
180 185 190 

He Glu Gin He Leu Arg His Leu He Pro Cys Ser He He Thr Pro 
195 200 205 

Leu Asp Cys Phe Trp Glu Gly Ser Gin Leu Leu Gly Pro Glu Ser Ala 
210 215 220 

Val Val He Pro Gly Leu Asn Gin Arg Leu Leu Trp Thr Thr Leu Asn 
225 230 235 240 

Pro Ala Ser Val Met Gin Tyr Met Lys Gin Lys Met Ser Glu Glu Lys 
245 250 255 

lie Ser Phe Asp Phe Glu Thr Val Glu Gin Tyr Met Lys Arg Ala Ala 
260 265 270 

He Gly Ser Gly Tyr Met Glu Lys Pro Cys Leu Aan Pro Leu Asn Pro 
275 280 285 

Asn Cys Pro Asp Thr Ala Pro Asn Lys Asn ser Thr Gin Pro Pro Asp 
290 295 300 

Val Gly Ala He Leu Ser Giy Gly Cys Tyr Gly Tyr Ala Ala Lye His 
305 310 315 320 

Met His Trp Pro Glu Glu Leu He Val Gly Gly Arg Lys Arg Asn Arg 
325 330 335 



Ser Gly His Leu Arg Lys Ala Gin Ala Leu Gin ser Val Val Gin Leu 
340 345 350 

Met Thr Glu Lys Glu Met Tyr Asp Gin Trp Gin Asp Aen Tyr Lys Val 
355 360 365 

His His Leu Gly Trp Thr Gin Glu Lys Ala Ala Glu Val Leu Asn Ala 
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370 375 380 

Trp Gin Arg Asn Phe Ser Arg Clu Val Glu Gin Leu Leu Arg Lys Gin 
385 390 395 400 

Ser Arg lie Ala Thr Asn Tyr Asp lie Tyr Val Phe Ser Ser Ala Ala 
405 410 415 

Leu Asp Asp lie Leu Ala Lys Phe Ser Hia Pro Ser Ala Leu Ser lie 
420 «5 430 

Val lie Gly Val Ala Val Thr Val Leu Tyr Ala Phe Cys Thr Leu Leu 
435 440 445 

Arg Trp Arg Asp Pro Val Arg Gly Gin Ser ser Val Gly Val Ala Gly 
450 «5 460 

Val Leu Leu Met Cys Phe Ser Thr Ala Ala Gly Leu Gly Leu Ser Ala 
465 470 475 480 

Leu Leu Gly lie Val Phe Asn Ala Leu Thr Ala Ala Tyr Ala Glu Ser 
485 490 495 

Asn Arg Arg Glu Gin Thr Lys Leu lie Leu Lys Asn Ala Ser Thr Gin 
500 505 510 

Val Val Pro Phe Leu Ala Leu Gly Leu Gly Val Asp His lie Phe He 
515 520 525 

Val Gly Pro Ser He Leu Phe Ser Ala Cys Ser Thr Ala Gly Ser Phe 
530 S35 540 

Phe Ala Ala Ala Phe He Pro Val Pro Ala Leu Lys Val Phe Cys Leu 
545 550 555 560 

Gin Ala Ala He Val Met Cys Ser Asn Leu Ala Ala Ala Leu Leu Val 

565 570 575 

Phe Pro Ala Met He Ser Leu Asp Leu Arg Arg Arg Thr Ala Gly Arg 
580 585 590 

Ala Asp He Phe Cys Cys Cys Phe Pro Val Trp Lys Glu Gin Pro Lys 
595 600 605 

Val Ala Pro Pro Val Leu Pro Leu Asn Asn Asn Asn Gly Arg Gly Ala 
610 615 620 

Arg His Pro Lys Ser Cys Asn Asn Asn Arg Val Pro Leu Pro Ala Gin 
625 630 635 640 

Asn Pro Leu Leu Glu Gin Arg Ala Asp He Pro Gly Ser Ser His Ser 
645 650 655 

Leu Ala Ser Phe Ser Leu Ala Thr Phe Ala Phe Gin His Tyr Thr Pro 
660 665 670 

Phe Leu Met Arg Ser Trp Val Lys Phe Leu Thr Val Met Gly Phe Leu 
675 680 685 
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ti. ser ser Leu Tyr Ala ser Thr Arg Leu Gin Asp oly 
Ala Ala Leu He Ser ser ^e y 7QQ 

690 695 



MO M , n. ~ -P - ~ >" - S - H " "0 

705 

nv Phe Tvr Ser Met Tyr Ala Val 
L eu Asp Ala Gin Thr Arg Leu Phe Gly Phe Tyr ^ 

725 

Bto elu pro Thr Gin Gin Gin Leu Leu Arg Asp 
Thr Gin Gly Asn Phe Olu Tyr rr ?50 
740 

•h. Ara Val Pro His Val He Lye Asn Asp Asn Gly 
Tyr His Asp Ser Phe Arg Val *r 7fi5 
755 

to» Leu Phe ser Glu Trp Leu Gly Asn 
Oly Leu Pro Asp Phe Trp Leu Leu Leu 7gQ 

770 

ua W . u- - -J «• «" * « a ° ly " 9 ~ ^ £ 

I c. «, - « « »• ~ « IS "* ,u " u E 

805 

uu tl . v.. «. ~ «■ g « « "> iSS 

820 

~ v Tie He Asn Gin Arg 

val Leu Thr Asn Arg Leu Val Asn Ser Asp Gly ^ 

835 

* t«u ser Ala Trp Ala Thr Asn Asp Val Phe Ala 
Ala Phe Tyr Asn Tyr Leu Ser Ala P ^ 

850 855 

•1. Gly Lys Leu Tyr Pro Glu Pro Arg Gin Tyr Phe 

870 

t„« Tie Pro Lys ser Leu Pro Leu 
Glu Tyr Asp Leu Lye lie Pro L.y ^ 

885 

« ^ .u - « - * s ai - " u Thr s Ihc s ' c 

900 * 

.„ ue jj. - - n. «v ;u xx. « -p - HI - - - 

.u «, ~ «» - « s * - s « °» u: rre phe "* n ' 

930 9J 

ser ser Leu Ala Met lie Leu Ala 
Trp Glu Gin Tyr Met Thr Leu Arg Ser Ser L g6Q 

945 950 



Tyr Gly Ala Ser Gin **y 87S 

865 

Hi. Gin Pro Asn Glu Tyr «»p «- --- 

885 



c v., « ~ - «• - « l - Si ~ ~ ~ m S " 

Val Leu Ala Ser Leu Ala 



Val Trp Ala Ala Val Le« Val He Leu Ser ^ 

980 



Gin He Phe 
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995 10 00 



1005 



Jo?o Val ^ c S6r ° ly Met MSt Leu <*- «» Asn 

1015 102 o 

Val Leu xi. s.r Leu Gly Ph. Met Thr Ser V.l Gly A .„ A rg Gln ^ 
1025 1030 1035 10 J 0 

Arg Val Gin Leu Ser Met Gin Met ser Leu Gly Pro Leu Val Hie Gly 
1045 10 50 1QS5 

Met Leu Thr Ser Gly Val Ala Val Ph. Met Leu Ser Thr Ser Pro Phe 
1060 1065 107Q 

Glu Phe Val II. Arg Hie Phe Cy. Trp Leu Leu Leu Val Val Leu Cy. 
1075 108 0 108 5 

VAl ?ion A1 ' A " n Ser L#U L * u Val ph * p "> I^u Leu ser Met 
1090 1095 laoo 

Val Gly Pro Glu Al. olu Leu Val Pro Leu Glu Hie Pro A.p Arg lie 
1105 1110 "15 1120 

Ser Thr Pro Ser Pro Leu Pro Val Arg Ser s.r Lye Arg Ser Gly Lys 

H25 1130 1135 

Ser Tyr Val Val Gin Gly Ser Arg Ser Ser Arg Gly ser Cye Gin Lys 
1140 "45 11S0 

Ser Hie Hi. Hi. Hie Hie Lye Aep Leu A.n Asp Pro Ser Leu Thr Thr 
"55 U60 1165 

lie Thrciu Glu Pro Gin S.r^Trp Lye Ser Ser A.n Ser Ser lie Gin 

Met Pro Aen A.p Trp Thr Tyr Cln Pro Arg Glu Gin Arg Pro Ala S.r 
1185 1190 119S 1200 

Tyr Ala Ala Pro Pro Pro Ala Tyr Hi. Lys Ala Ala Al. Gin Gin Hi. 

I 205 1210 12I5 

Hie Gin His Gin Gly Pro Pro Thr Thr Pro Pro Pro Pro Phe Pro Thr 
1220 1225 123Q 

Ala Tyr Pro Pro Glu Leu Gin Ser He Val Val Gin Pro Glu Val Thr 
1235 12 4 0 1245 

Val Glu Thr Thr Hie s.r A.p Ser A.n Thr Thr Lye Val Thr Ala Thr 
1250 1255 i 2 6 0 

Ala A.n il. Lys val Glu Leu Ala Met Pro Gly Arg Ala Val Arg S.r 

1270 inc 

U 1275 1280 

Tyr Aen Phe Thr Ser 
1285 

(2) INFORMATION FOR SEQ ID N0j7i 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 345 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

ii) MOLECULE TYPE: DNA (genomic) 



xi) SEQUENCE DESCRIPTION: SEQ ID 



AAGGTCCATC 


AGCTTTGGAT ACAGGAAGGT 


GGTTCGCTCG 


AGCATGAGCT 


AGCCTACACG 


60 


CAGAAATCGC 


TCGGCGAGAT GGACTCCTCC ACGCACCAGC 


TGCTAATCCA 


AACNCCCAAA 


120 


GATATGGACG 


CCTCGATACT GCACCCGAAC 


GCGCTACTGA 


CGCACCTGGA 


CGTGGTGAAG 


180 


AAAGCGATCT 


CGGTGACGGT GCACATGTAC 


GACATCACGT 


GGAGNCTCAA 


GGACATGTGC 


240 


TACTCGCCCA 


GCATACCGAG NTTCGATACG 


CACTTTATCG 


AGCAGATCTT CGAGAACATC 


300 


ATACCGTGCG 


CGATCATCAC GCCGCTGGAT 


TGCTTTTGGG 


AGGGA 




345 



2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO;8: 



( 

Ly8 val His Gin Leu Trp He Gin Glu Gly Oly Ser Leu Glu His Glu 



Leu Ala Tyr Thr Gin Lys Ser Leu Gly Glu Met Asp Ser Ser Thr His 

20 25 
Gin Leu Leu lie Gin Thr Pro Lys Asp Met Asp Ala Ser lie Leu His 



35 



40 



45 



Pro Aon Ala Leu Leu Thr His Leu Asp Val Val Lys Lys Ala He Ser 



50 



5S 



Val Thr Val His Met Tyr Asp lie Thr Trp Xaa Leu Lye Asp Met Cye 



65 



70 



Tyr Ser Pro Ser lie Pro Xaa Ph. Asp Thr Hi. Phe lie Glu Gin lie 



85 9° 
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Phe Glu Asn lie II. Pro Cys Ala He He Thr Pro Leu Asp Cye Phe 

205 no 

Trp Glu Gly 
115 

(2) INFORMATION FOR SEQ ID NO: 9; 

(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 5187 base pairs 

(B) TYPEi nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GGGTCTGTCA CCCGGAGCCG GACTCCCCGG CGGCCAGCAG CGTCCTCGCG AGCCGAGCGC 
CCAGGCGCGC CCGGAGCCCG CGGCGGCGGC CGCAACATGG CCTCGGCTGG TAACGCCGCC 
GGGGCCCTGG GCACGCAGGC CGGCGGCGGG AGGCGCAGAC GGACCGGGGG ACCGCACCGC 
GCCGCCCCGG ACCGGGACTA TCTGCACCGG CCCAGCTACT GCGACGCCGC CTTCGCTCTG 
GAGCAGATTT CCAAGGGGAA GGCTACTCGC CGGAAAGCGC CGCTGTGGCT GAGAGCGAAG 
TTTCAGAGAC TCTTATTTAA ACTGGGTTGT TACATTCAAA AGAACTGCCG CAAGTTTTTG 
GTTGTGGGTC TCCTCATATT TGGGGCCTTC GCTGTGGCAT TAAACGCAGC TAATCTCGAG 
ACCAACGTGG AGGAGCTGTC GCTGGAAGTT GGTGGACGAG TGAGTCGAGA ATTAAATTAT 
ACCCGTCAGA AGATAGGAGA AGAGCCTATG TTTAATCCTC AACTCATGAT ACAGACTCCA 
AAAGAAGAAG GCGCTAATGT TCTCACCACA GAGGCTCTCC TGCAACACCT GGACTCAGCA 
CTCCAGGCCA GTCGTGTGCA CGTCTACATG TATAACAGGC AATGGAAGTT GGAACATTTG 
TGCTACAAAT CAGGGGAACT TATCACGGAG ACAGGTTACA TGGATCAGAT AATAGAATAC 
CTTTACCCTT GCTTAATCAT TACACCTTTG GACTGCTTCT GGGAAGGGGC AAAGCTACAG 
TCCGGGACAC CATACCTCCT AGGTAAGCCT CCTTTACGGT GCACAAACTT TGACCCCTTG 
GAATTCCTAG AAGAGTTAAA GAAAATAAAC TACCAAGTGG ACAGCTGGGA GGAAATGCTG 
AATAAAGCCG AAGTTGGCCA TGGGTACATG CACCGGCCTT GCCTCAACCC ACCCGACCCA 
GATTGCCCTG CCACACCCCC TAACAAAAAT TCAACCAAAC CTCTTGATGT GGCCCTTGTT 
TTGAATGGTG GATGTCAAGG TTTATCCAGG AAGTATATGC ATTGGCAGCA GGAGTTGATT 
GTGGGTGGTA CCGTCAAGAA TCCCACTGGA AAACTTGTCA GCGCTCACGC CCTCCAAACC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
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ATGTTCCAGT TAATGACTCC CAAGCAAATG TATGAACACT TCAGGGGCTA CGACTATGTC 
TCTCACATCA ACTGGAATGA AGACAGCGCA GCCGCCATCC TGGAGGCCTG GCAGAGGACT 
TA CGTGGAGG TGGTTCATCA AAGTGTCGCC CCAAACTCCA CTCAAAAGGT GCTTCCCTTC 
ACAACCACGA CCCTGGACGA CATCCTAAAA TCCTTCTCTC ATGTCAGTGT CATCCGAGTG 
GCCAGCGGCT ACCTACTGAT GCTTGCCTAT GCCTGTTTAA CCATGCTGCG CTGGGACTGC 
TCCAAGTCCC AGGGTGCCGT GGGGCTGGCT GGCGTCCTGT TGGTTGCGCT GTCAGTGGCT 
GCAGGATTGG GCCTCTGCTC CTTGATTGGC ATTTCTTTTA ATGCTGCGAC AACTCAGGTT 
^OCCGTTTC TTGCTCTTGG TGTTGGTGTG GATGATGTCT TCCTCCTGGC CCATGCATTC 
AGTCAAACAG GACAGAATAA GAGGATTCCA TTTGAGGACA GGACTGGGGA GTGCCTCAAG 
CGCACCGCAG CCAGCGTGGC CCTCACCTCC ATCAGCAATG TCACCGCCTT CTTCATGGCC 
GCATTGATCC CTATCCCTGC CCTGCGAGCG TTCTCCCTCC AGGCTGCTGT GGTGGTGGTA 
TTCAATTTTG CTATGGTTCT GCTCATTTTT CCTGCAATTC TCAGCATGGA TTTATACAGA 
CGTGAGGACA GAAGATTGGA TATTTTCTGC TGTTTCACAA GCCCCTGTGT CAGCAGGGTG 
ATTCAAGTTG AGCCACAGGC CTACACAGAG CCTCACAGTA ACACCCGGTA CAGCCCCCCA 
CCCCCATACA CCAGCCACAG CTTCGCCCAC GAAACCCATA TCACTATGCA GTCCACCGTT 
CACCTCCGCA CAGAGTATGA CCCTCACACG CACGTGTACT ACACCACCGC CGAGCCACGC 
TCTGAGATCT CTGTACAGCC TGTTACCGTC ACCCAGGACA ACCTCAGCTG TCAGAGTCCC 
GAGAGCACCA GCTCTACCAG GGACCTGCTC TCCCAGTTCT CAGACTCCAG CCTCCACTGC 
CTCGAGCCCC CCTGCACCAA GTGGACACTC TCTTCGTTTG CAGAGAAGCA CTATGCTCCT 
TTCCTCCTGA AACCCAAAGC CAAGGTTGTG GTAATCCTTC TTTTCCTGGG CTTGCTGGGG 
GTCAGCCTTT ATGGGACCAC CCGAGTGAGA GACGGGCTGG ACCTCACGGA CATTGTTCCC 
COOGAAACCA GAGAATATGA CTTCATAGCT GCCCAGXTCA AGTACTTCTC TTTCTACAAC 
ATGTATATAG TCACCCAGAA AGCAGACTAC CCCAATATCC AGCACCTACT TTACGACCTT 
CATAAGAGTT TCAGCAATGT GAAGTATGTC ATGCTGGAGG AGAACAAGCA ACTTCCCCAA 
ATGTGGCTGC ACTACTTTAG AGACTGGCTT CAAGGACTTC AGGATGCATT TGACAGTGAC 
TGGGAAACTG GGAGGATCAT GCCAAACAAT TATAAAAATG GATCAGATGA CGGGGTCCTC 
GCTTACAAAC TCCTGGTGCA GACTGGCAGC OGAGACAAGC CCATCGACAT TAGTCAGTTG 
ACTAAACAGC GTCTGGTAGA CGCAGATGGC ATCATTAATC CGAGCGCTTT CTACATCTAC 
CTGACCGCTT GGGTCAGCAA CGACCCTGTA GCTTACGCTG CCTCCCAGGC CAACATCCCG 



1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 



44 



WO 96/11260 PCT/US95/13233 

CCTCACCGGC CGCAGTGCGT CCATGACAAA GCCGACTACA TCCCAGAGAC CAGCCTGAGA 2940 

ATCCCAGCAG CAGAGCCCAT CGAGTACGCT CAGTTCCCTT TCTACCTCAA CCGCCTACGA 3000 

GACACCTCAG ACTTTGTGGA AGCCATAGAA AAAGTGAGAG TCATCTGTAA CAACTATACG 3060 

AGCCTGGGAC TGTCCAGCTA CCCCAATGGC TACCCCTTCC TGTTCTGGGA GCAATACATC 3120 

AGCCTGCGCC ACTGGCTGCT GCTATCCATC AGCGTGGTGC TCGCCTGCAC GTTTCTAGTG 3180 

TGCCCAGTCT TCCTCCTGAA CCCCTGGACG GCCGGGATCA TTCTCATGGT CCTGGCTCTG 3240 

ATGACCGTTG AGCTCTTTGG CATGATGGGC CTCATTGGGA TCAAGCTGAG TGCTGTGCCT 3300 

GTGGTCATCC TGATTGCATC TGTTGGCATC GGAGTGGAGT TCACCGTCCA CGTGGCTTTG 3360 

GCCTTTCTGA CAGCCATTGG GGACAAGAAC CACAGGGCTA TGCTCGCTCT GGAACACATG 3420 

TTTGCTCCCG TTCTGGACGG TGCTGTGTCC ACTCTGCTGG GTCTACTGAT GCTTGCAGGG 3480 

TCCGAATTTG ATTTCATTGT CAGATACTTC TTTGCCGTCC TGGCCATTCT CACCGTCTTG 3540 

GGGGTTCTCA ATGGACTGGT TCTGCTGCCT GTCCTCTTAT CCTTCTTTGG ACCGTGTCCT 3600 

GAGGTGTCTC CAGCCAATGG CCTAAACCGA CTGCCCACTC CTTCGCCTGA GCCGCCTCCA 3660 

AGTGTCGTCC GGTTTGCCGT GCCTCCTGGT CACACGAACA ATGGGTCTGA TTCCTCCGAC 3720 

TCGGAGTACA GCTCTCAGAC CACGGTGTCT GGCATCAGTG AGGAGCTCAG CCAATACCAA 3780 

GCACAGCAGG GTGCCGGAGG CCCTGCCCAC CAAGTGATTG TGGAAGCCAC AGAAAACCCT 3840 

GTCTTTCCCC GGTCCACTGT GGTCCATCCG GACTCCAGAC ATCAGCCTCC CTTGACCCCT 3900 

CGGCAACAGC CCCACCTGGA CTCTGGCTCC TTGTCCCCTG GACGGCAAGG CCAGCAGCCT 3960 

CGAAGGGATC CCCCTAGAGA AGGCTTGCGG CCACCCCCCT ACAGACCGCG CAGAGACGCT 4020 

TTTGAAATTT CTACTGAAGG GCATTCTGGC CCTAGCAATA GGGACCGCTC AGGGCCCCGT 4080 

GGGGCCCGTT CTCACAACCC TCGGAACCCA ACGTCCACCG CCATGGGCAG CTCTGTGCCC 4140 

AGCTACTGCC AGCCCATCAC CACTGTGACG GCTTCTGCTT CGGTGACTGT TGCTGTGCAT 4200 

CCCCCGCCTG GACCTGCGCG CAACCCCCGA GGGGGGCCCT GTCCAGGCTA TGAGAGCTAC 4260 

CCTGAGACTG ATCACGGGGT ATTTGAGGAT CCTCATGTGC CTTTTCATGT CAGGTGTGAG 4320 

AGGAGGGACT CAAAGGTGCA GGTCATAGAG CTACAGGACG TGGAATGTGA GGAGAGGCCG 4380 

TGGGGGAGCA GCTCCAACTG AGGGTAATTA AAATCTGAAG CAAAGAGGCC AAAGATTGGA 4440 

AAGCCCCGCC CCCACCTCTT TCCAGAACTG CTTGAAGAGA ACTGCTTGGA ATTATGGGAA 4500 

GGCAGTTCAT TGTTACTGTA ACTGATTGTA TTATTKKGTG AAATATTTCT ATAAATATTT 4560 

AARAGGTGTA CACATCTAAT ATACATGGAA ATGCTGTACA GTCTATTTCC TGGGGCCTCT 4620 
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CCACTCCTGC CCCAGAGTGG GGAGACCACA GGGCCCCTTT 


CCCCTGTGTA 


CATTGGTCTC 


4680 


TGTGCCACAA CCAAGCTTAA CTTAGTTTTA AAAAAAATCT 


CCCAGCATAT 


GTCGCTGCTG 


4740 


CTTAAATATT GTATAATTTA CTTGTATAAT TCTATGCAAA 


TATTGCTTAT 


GTAATAGGAT 


4800 


TATTTGTAAA GGTTTCTGTT TAAAATATTT TAAATTTGCA 


TATCACAACC 


CTGTGGTAGG 


4860 


ATGAATTGTT ACTGTTAACT TTTGAACACG CTATGCGTGG 


TAATTGTTTA ACGAGCAGAC 


4920 


ATGAAGAAAA CAGGTTAATC CCAGTGGCTT CTCTAGGGGT 


AGTTGTATAT 


GGTTCGCATG 


4980 


GGTGGATGTG TGTGTGCATG TGACTTTCCA ATGTACTGTA 


TTGTGGTTTG 


TTGTTGTTGT 


5040 


TGCTGTTGTT GTTCATTTTG GTGTTTTTGG TTGCTTTGTA 


TGATCTTAGC 


TCTGGCCTAG 


5100 


GTGGGCTGGG AAGGTCCAGC TCTTTTTCTG TCGTGATGCT 


GGTGGAAAGG 


TGACCCCAAT 


5160 


CATCTGTCCT ATTCTCTGGG ACTATTC 






5187 


(2) INFORMATION FOR SEQ ID NO: 10: 









(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1434 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Ala Ser Ala Gly Asn Ala Ala Gly Ala Leu Gly Arg Gin Ala Gly 
! 5 10 15 

Gly Gly Arg Arg Arg Arg Thr Gly Gly Pro Hie Arg Ala Ala Pro Asp 
20 25 30 

Arg Asp Tyr Leu His Arg Pro Ser Tyr Cys Asp Ala Ala Phe Ala Leu 
35 40 45 

Glu Gin lie Ser Lys Gly Lys Ala Thr Gly Arg Lys Ala Pro Leu Trp 
50 55 60 



Leu 



Arg Ala Lys Phe Gin Arg Leu Leu Phe Lys Leu Gly Cys Tyr lie 
* J 75 80 



65 70 



Gin Lys Asn Cys Gly Lys Phe Leu Val Val Gly Leu Leu He Phe Gly 
85 9° 95 

Ala Phe Ala Val Gly Leu Lys Ala Ala Asn Leu Glu Thr Asn Val Glu 
100 105 I 10 

Glu Leu Trp Val Glu Val Gly Gly Arg Val Ser Arg Glu Leu Asn Tyr 
115 120 I 25 
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Thr Arg Gin Lys lie Gly Glu Glu Ala Met Phe Asn Pro Gin Leu Met 
130 135 140 

lie Gin Thr Pro Lys Glu Glu Gly Ala Asn Val Leu Thr Thr Glu Ala 
145 150 155 lfi0 

Leu Leu Gin His Leu Asp Ser Ala Leu Gin Ala Ser Arg Val His Val 
165 170 175 

Tyr Met Tyr Asn Arg Gin Trp Lye Leu Glu His Leu Cys Tyr Lvs Ser 
180 185 !9 0 

Gly Glu Leu He Thr Glu Thr Gly Tyr Met Asp Gin He lie Glu Tyr 
195 200 205 

Leu Tyr Pro Cys Leu He He Thr Pro Leu Asp Cys Phe Trp Glu Glv 
210 215 220 

Ala Lys Leu Gin Ser Gly Thr Ala Tyr Leu Leu Gly Lys Pro Pro Leu 
225 230 235 2 40 

Arg Trp Thr Asn Phe Asp Pro Leu Glu Phe Leu Glu Glu Leu Lys Lys 
245 250 255 

He Asn Tyr Gin Val Asp Ser Trp Glu Glu Met Leu Asn Lys Ala Glu 

260 265 270 

Val Gly His Gly Tyr Met Asp Arg Pro Cys Leu Asn Pro Ala Asp Pro 
275 280 285 

Asp Cys Pro Ala Thr Ala Pro Asn Lys Asn Ser Thr Lys Pro Leu Asp 
290 295 300 

Val Ala Leu Val Leu Asn Gly Gly Cys Gin Gly Leu Ser Arg Lys Tvr 
305 310 315 320 

Met His Trp Gin Glu Glu Leu He Val Gly Gly Thr Val Lys Asn Ala 
325 330 335 

Thr Gly Lys Leu Val ser Ala His Ala Leu Oln Thr Met Phe Gin Leu 
340 345 350 

Met Thr Pro Lys Gin Met Tyr Glu His Phe Arg Gly Tyr Asp Tvr Val 
355 360 365 

Ser His He Asn Trp Asn Glu Asp Arg Ala Ala Ala He Leu Glu Ala 
370 37S 380 

Trp Gin Arg Thr Tyr Val Glu Val Val His Gin Ser Val Ala Pro Asn 
385 390 395 400 

Ser Thr Gin Lys Val Leu Pro Phe Thr Thr Thr Thr Leu Asp Asp He 
405 410 415 

Leu Lys Ser Phe ser Asp Val Ser Val He Arg Val Ala Ser Gly Tyr 
<20 425 430 

Leu Leu Met Leu Ala Tyr Ala Cys Leu Thr Met Leu Arg Trp Asp Cys 
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435 440 445 

Ser Lys Ser Gin Gly Ala Val Gly Leu Ala Gly Val Leu Leu Val Ala 

450 «5 460 

Leu Ser Val Ala Ala Gly Leu Gly Leu Cys ser Leu lie Gly lie Ser 



465 



470 475 480 



Phe Asn Ala Ala Thr Thr Gin Val Leu Pro Phe Leu Ala Leu Gly Val 
485 490 495 

Glv Val Asp Aap Val Phe Leu Leu Ala His Ala Phe Ser Glu Thr Gly 
500 505 510 

Gin Asn Lyfl Arg He Pro Phe Glu Asp Arg Thr Gly Glu Cys Leu Lye 
sl5 520 525 

Arg Thr Gly Ala Ser Val Ala Leu Thr Ser He Ser Ann Val Thr Ala 
530 535 540 



Phe Phe Met Ala Ala Leu lie Pro lie Pro Ala Leu Arg Ala Phe Ser 
545 



550 555 560 



Leu Gin Ala Ala val Val Val Val Phe Asn Phe Ala Met Val Leu Leu 
565 570 575 

lie Phe Pro Ala He Leu Ser Met Asp Leu Tyr Arg Arg Glu Asp Arg 
580 585 590 

Arg Leu Asp He Phe Cys Cys Phe Thr Ser Pro Cys Val Ser Arg Val 
595 600 605 

He Gin val Glu Pro Gin Ala Tyr Thr Glu Pro His Ser Asn Thr Arg 
610 «5 620 

Tyr Ser Pro Pro Pro Pro Tyr Thr Ser His Ser Phe Ala His Glu Thr 
625 63° 635 

His He Thr Met Gin Ser Thr Val Gin Leu Arg Thr Glu Tyr Asp Pro 
645 650 655 

His Thr His Val Tyr Tyr Thr Thr Ala Glu Pro Arg Ser Glu He Ser 
660 665 670 

Val Gin Pro Val Thr Val Thr Gin Asp Asn Leu Ser Cys Gin Ser Pro 
675 680 685 

Glu ser Thr Ser Ser Thr Arg Asp Leu Leu Ser Gin Phe Ser Asp Ser 
690 695 700 

Ser Leu His Cys Leu Glu Pro Pro Cys Thr Lys Trp Thr Leu Ser Ser 
705 710 715 

Phe Ala Glu Lys Hie Tyr Ala Pro Phe Leu Leu Lys Pro Lys Ala Lys 
725 730 735 

Val Val Val He Leu Leu Phe Leu Gly Leu Leu Gly Val Ser Leu Tyr 
740 745 750 
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Gly Thr Thr Arg Val Arg Aep Gly Leu Asp Leu Thr Aap He Val Pro 
755 760 765 

Arg Glu Thr Arg Glu Tyr Asp Phe lie Ala Ala Gin Phe Lys Tyr Phe 
770 775 78Q 

Ser Ph. Tyr A.„ Met Tyr II. v.l Thr Gin Ly. Al. A.p T yr Pro A.n 

785 790 795 800 

lie Cln Hi. Leu Leu Tyr A.p Leu His Lys Ser Ph. Ser Aan Val Lys 
805 aiO 815 

Tyr Val Met Leu Glu Glu Asn Lys Gin Leu Pro Gin Met Trp Leu His 
820 825 830 

Tyr Ph. Arg Asp Trp Leu Gin Gly Leu Gin Asp Ala Phe Asp Ser Asp 
835 840 845 

Trp Glu Thr Gly Arg lie Met Pro Asn Asn Tyr Lys Asn Gly ser Asp 
850 855 860 

Asp Gly Val Leu Ala Tyr Ly. Leu Leu Val Gin Thr Gly Ser Arg Asp 
865 870 875 88 J) 

Lys Pro He Asp He Ser Gin Leu Thr Lys Gin Arg Leu Val Asp Ala 
885 890 egg 

Asp Gly He He Asn Pro Ser Ala Phe Tyr He Tyr Leu Thr Ala Trp 
900 90S 910 

Val Ser Asn Asp Pro Val Ala Tyr Ala Ala Ser Gin Ala Asn He Arg 
915 920 925 

Pro His Arg Pro Glu Trp Val His Asp Lys Ala Asp Tyr Met Pro Glu 
930 935 940 

Thr Arg L.u Arg He Pro Ala Ala Glu Pro He Glu Tyr Al. Gin Phe 
94S 950 955 960 

Pro Phe Tyr Leu Asn Gly Leu Arg Asp Thr Ser Asp Phe Val Glu Ala 
965 970 9?5 

He Glu Lys Val Arg Val He Cys Asn Asn Tyr Thr Ser Leu Gly Leu 
980 985 9 9 o 

Ser Ser Tyr Pro Asn Gly Tyr Pro Phe Leu Phe Trp Glu Gin Tyr He 
995 10 00 1005 

SCr ^« Ar9 Hi " Trp ^ ^ Ser Ile Sar v »l Val Leu Ala Cys 
1010 1015 1020 

Thr Ph. Leu Val Cys Ala Val Phe Leu Leu A.n Pro Trp Thr Al. Gly 
1025 1030 1035 ioio 

He Ile Val Met Val Leu Ala Leu Met Thr Val Glu Leu Ph. Gly Met 
1045 1050 1055 

Met Gly Leu He Gly He Ly. Leu Ser Ala Val Pro Val Val He Leu 
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1060 10" 1070 

II. Ala Ser Val Gly IX. Gly Val Olu Phe Thr Val HisVal Ala Leu 

1075 1080 10 

Ala Ph. Leu Thr Ala II. Oly Asp Lys Asn His Arg Ala Met Leu Ala 

1090 1100 
Leu Glu Hi. Met Phe Ala Pro Val Leu Asp Gly Ala Val Ser Thr Leu 
1105 1115 

Leu Gly Val Leu Met Leu Ala Gly Ser Glu^Phe Asp Phe lie Va^Arg 



1125 



Tyr Phe Phe Ala Val Leu Ala He Leu Thr Val Leu Gly Val Leu Asn 
1140 1145 

Gly Leu Val Leu Leu Pro Val Lev, Leu Ser Phe Phe Gly Pro cys Pro 
1155 " 60 1165 



Glu Val S.r Pro Ala Asn Gly Leu Asn Arg Leu Pro Thr Pro Ser Pro 
1170 1"5 "SO 

Glu Pro Pro Pro Ser Val Val Arg Phe Ala Val Pro Pro Gly His Thr 
1185 "90 1195 

Asn Asn Gly Ser Asp Ser Ser Asp Ser Glu Tyr Ser Ser Gin Thr Thr 
1205 121° 

Val Ser Gly lie Ser Glu Glu Leu Arg Gin Tyr Glu Ala Gin Gin Gly 
1220 1225 1230 

Ala Gly Gly Pro Ala His Gin Val lie Val Glu Ala Thr Glu Asn Pro 
1235 1240 1 2 « 

Val Ph. Ala Arg Ser Thr Val Val His Pro Asp Ser Arg His Gin Pro 

1250 "55 
Pro Leu Thr Pro Arg Gin Gin Pro His Leu Asp S.r Gly Ser Leu Ser 
1255 I* 70 " 75 



Pro Gly Arg Gin Gly Gin Gin Pro Arg Arg Asp Pro Pro Arg Glu Gly 
1285 1290 

Leu Arg Pro Pro Pro Tyr Arg Pro Arg Arg Asp Ala Phe Glu lie Ser 



1300 



1305 



Thr Glu Gly His ser Gly Pro Ser Asn Arg Asp Arg Ser^Gly Pro Arg 
1315 



1320 1325 



Gly Ala Arg Ser His Asn Pro Arg Asn Pro Thr Ser Thr Ala Met Gly 
1330 1335 1340 

Ser S.r Val Pro Ser Tyr Cys Gin Pro lie Thr Thr Val Thr Ala S.r 
1345 "50 1355 

Ala S.r Val Thr Val^la Val His Pro ProPro Gly Pro Gly Ar^Asn 
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Pro Arg Gly oiyP« Cy. Pro G ly Tyr^Glu Ser Tyr Pro Glu Thr A.p 



1390 



His G ly Val^Ph. Glu Asp Pro His^Val Pro Phe His Va^Arg Cys Glu 
Arg Arg^.p Ser Ly. Val GluVal II. Glu Leu Gin Asp Va i C lu Cy. 



1420 



Glu Glu Arg Pro Trp Gly Ser Ser Ser Asn 
1425 1430 

(2) INFORMATION FOR SEQ ID NO: lit 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNBSS x single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



<xi) SEQUENCE DESCRIPTION : SEQ ID NO:ll: 

He He Thr Pro Leu Asp Cys Phe Trp Glu Gly 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 12: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Leu He Val Gly Gly 
1 5 

(2) INFORMATION FOR SEQ ID NOil3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 13: 

Pro Phe Phe Trp Glu Gin Tyr 
1 5 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE? nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc - "primer" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
GGACGAATTC AARGTNCAYC ARYTNTGG 
(2) INFORMATION FOR SEQ ID NO: 15 5 

(i) SEQUENCE CHARACTERISTICS J 

(A) LENGTH; 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc - "primer" 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 15: 
GGACGAATTC CYTCCCARAA RCANTC 
(2) INFORMATION FOR SEQ ID NO?16i 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc * "primer" 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 16: 
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GGACG AATTC YTNGANTGYT TYTGGGA 
(2) INFORMATION FOR SBQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTH: 31 base pairs 

(B) TYPE i nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc * "primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CATACCAGCC AAGCTTGTCN GGCCARTGCA T 
(2) INFORMATION FOR SEQ ID NO: IS: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5288 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
GAATTCCGGG GACCGCAAGG AGTGCCGCGG AAGCGCCCGA AGGACACGCT CGCTCGCCGC 
GCCGGCTCTC GCTCTTCCCC GAACTGGATG TGGGCAGCGG CGGCCGCAGA GACCTCGCGA 
CCCCCGCGCA ATGTGGCAAT GGAAGGCGCA GGGTCTGACT CCCCGGCAGC GGCCGCGGCC 
GCAGCGGCAG CAGCGCCCGC CGTGTGAGCA GCAGCAGCGC CTGGTCTGTC AACCGGAGCC 
CGAGCCCGAG CAGCCTGCGG CCAGCAGCGT CCTCGCAAGC CGAGCGCCCA GGCGCGCCAG 
GAGCCCGCAG CAGCGGCAGC AGCGCGCCGG GCCGCCCGGG AAGCCTCCGT CCCCGCGGCG 
GCGGCGGCGG CGGCGGCGGC AACATGCCCT CGGCTGGTAA CGCCGCCCAG CCCCAGGACC 
GCGGCGGCGG CGGCAGCGGC TGTATCGGTG CCCCGGGACG GCCGGCTGGA GGCGGGAGGC 
GCAGACGGAC GGGGGGGCTG CGCCGTGCTG CCGCGCCGGA CCGGGACTAT CTGCACCGGC 540 
CCAGCTACTG CGACGCCGCC TTCGCTCTGG AGCAGATTTC CAAGGGGAAG GCTACTGCCC 600 
GGAAAGCGCC ACTGTGGCTG AGAGCGAAGT TTCAGAGACT CTTATTTAAA CTGGGTTGTT 660 
ACATTCAAAA AAACTGCGGC AAGTTCTTGC TTGTGCGCCT CCTCATATTT GGGGCCTTCG 720 
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CGGTGOGATT AAAAGCAGCG AACCTCGAGA CCAACGTGGA GGAGCTGTGG GTGGAAGTTG 
GAGGACGAGT AAGTCGTGAA TTAAATTATA CTCGCCAGAA GATTGGAGAA GAGGCTATGT 
TTAATCCTCA ACTCATGATA CAGACCCCTA AAGAAGAAGG TGCTAATGTC CTGACCACAG 
AAGCGCTCCT ACAACACCTG GACTCGGCAC TCCAGGCCAG CCGTGTCCAT GTATACATGT 
ACAACAGGCA GTGGAAATTG GAACATTTGT GTTACAAATC AGGAGAGCTT ATCACAGAAA 
CAGGTTACAT GGATCAGATA ATAGAATATC TTTACCCTTG TTTGATTATT ACACCTTTGG 
ACTGCTTCTG GGAAGGGGCG AAATTACAGT CTGGGACAGC ATACCTCCTA GGTAAACCTC 
CTTTGCGGTG GACAAACTTC GACCCTTTGG AATTCCTGGA AGAGTTAAAG AAAATAAACT 
ATCAAGTGGA CAGCTGGGAG GAAATGCTGA ATAAGGCTGA GGTTGGTCAT GGTTACATGG 
ACCGCCCCTG CCTCAATCCG GCCGATCCAG ACTGCCCCGC CACAGCCCCC AACAAAAATT 
CAACCAAACC TCTTGATATG GCCCTTGTTT TGAATGGTGG ATGTCATGGC TTATCCAGAA 
AGTATATGCA CTGGCAGGAG GAGTTGATTG TGGGXGGCAC AGTCAAGAAC AGCACTGGAA 
AACTCGTCAG CGCCCATGCC CTGCAGACCA TGTTCCAGTT AATGACTCCC AAGCAAATGT 
ACGAGCACTT CAAGGGGTAC GAGTATGTCT CACACATCAA CTGGAACGAG GACAAAGCGG 
CAGCCATCCT GGAGGCCTGG CAGAGGACAT ATGTGGAGGT GGTTCATCAG AGTGTCGCAC 
AGAACTCCAC TCAAAAGGTG CTTTCCTTCA CCACCACGAC CCTGGACGAC ATCCTGAAAT 
CCTTCTCTGA CGTCAGTGTC ATCCGCGTGG CCAGCGGCTA CTTACTCATG CTCGCCTATG 
CCTGTCTAAC CATGCTGCGC TGGGACTGCT CCAAGTCCCA GGGTGCCGTG GGGCTGGCTG 
GCGTCCTGCT GGTTGCACTG TCAGTGGCTG CAGGACTGGG CCTGTGCTCA TTGATCGGAA 
TTTCCTTTAA CGCTGCAACA ACTCAGGTTT TGCCATTTCT CGCTCTTGGT GTTGGTGTGG 
ATGATGTTTT TCTTCTGGCC CACGCCTTCA GTGAAACAGG ACAGAATAAA AGAATCCCTT 
TTGAGGACAG GACCGGGGAG TGCCTGAAGC GCACAGGAGC CAGCGTGGCC CTCACGTCCA 
TCAGCAATGT CACAGCCTTC TTCATGGCCG CGTTAATCCC AATTCCCGCT CTGCCGGCGT 
TCTCCCTCCA GGCAGCGGTA GTAGTGGTGT TCAATTTTGC CATGGTTCTG CTCATTTTTC 
CTGCAATTCT CAGCATGGAT TTATATCGAC GCGAGGACAG GAGACTGGAT ATTTTCTGCT 
GTTTTACAAG CCCCTGCGTC AGCAGAGTGA TTCAGGTTGA ACCTCAGGCC TACACCGACA 
CACACGACAA TACCCGCTAC AGCCCCCCAC CTCCCTACAG CAGCCACAGC TTTGCCCATG 
AAACGCAGAT TACCATGCAG TCCACTGTCC AGCTCCGCAC GGAGTACGAC CCCCACACGC 
ACGTGTACTA CACCACCGCT GAGCCGCGCT CCGAGATCTC TGTGCAGCCC GTCACCGTGA 



780 
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CACAGGACAC CCTCAGCTGC GAGAGCCCAG AGAGCACCAG CTCCACAAGG GACCTGCTCT 2520 
CCCAGTTCTC CGACTCCAGC CTCCACTGCC TCGAGCCCCC CTGTACGAAG TGGACACTCT 2580 
CATCTTTTGC TGAGAAGCAC TATGCTCCTT TCCTCTTGAA ACCAAAAGCC AAGGTAGTGG 2640 
TGATCTTCCT TTTTCTGGGC TTGCTGGGGG TCAGCCTTTA TGGCACCACC CGAGTGAGAG 2700 
ACGGGCTGGA CCTTACGGAC ATTGTACCTC GGGAAACGAG AGAATATGAC TTTATTGCTG 2760 
CACAATTCAA ATACTTTTCT TTCTACAACA TGTATATACT CACCCAGAAA GCACACTACC 2820 
CGAATATCCA GCACTTACTT TACGACCTAC ACAGGAGTTT CAGTAACGTG AAGTATGTCA 2880 
TGTTGGAAGA AAACAAACAG CTTCCCAAAA TGTGGCTGCA CTACTTCAGA GACTGGCTTC 2940 
AGGGACTTCA GGATGCATTT GACAGTGACT GGGAAACCGG GAAAATCATG CCAAACAATT 3000 
ACAAGAATGG ATCAGACGAT GGAGTCCTTG CCTACAAACT CCTGGTGCAA ACCGGCAGCC 3060 
GCGATAAGCC CATCGACATC AGCCAGTTGA CTAAACAGCG TCTGGTGGAT GCAGATGGCA 3120 
TCATTAATCC CAGCGCTTTC TACATCTACC TGACGGCTTG GGTCAGCAAC GACCCCGTCG 3180 
CGTATGCTGC CTCCCAGGCC AACATCCGGC CACACCGACC AGAATGGGTC CACGACAAAG 3240 
CCGACTACAT GCCTGAAACA AGGCTGAGAA TCCCGGCAGC AGAGCCCATC GAGTATGCCC 3300 
AGTTCCCTTT CTACCTCAAC GGGTTGCGGG ACACCTCAGA CTTTGTGGAG GCAATTGAAA 3360 
AAGTAAGGAC CATCTGCAGC AACTATACGA GCCTGGGGCT GTCCAGTTAC CCCAACGGCT 3420 
ACCCCTTCCT CTTCTGGGAG CAGTACATCG GCCTCCGCCA CTGGCTGCTG CTGTTCATCA 3480 

GCGTGGTGTT GGCCTGCACA TTCCTCGTGT GCGCTGTCTT CCTTCTGAAC CCCTGGACGG 3540 

CCGGGATCAT TGTGATGGTC CTGGOCCTCA TGACGGTCGA GCTGTTCGGC ATGATGGGCC 3600 

TCATCGGAAT CAAGCTCAGT GCCGTGCCCG TGGTCATCCT GATCGCTTCT GTTGGCATAG 3660 

GAGTGGAGTT CACCGTTCAC GTTGCTTTGG CCTTTCTGAC GGCCATCGGC GACAAGAACC 3720 

GCAGGGCTGT GCTTCCCCTG GAGGACATGT TTGCACCCGT CCTGGATGGC GCCGTGTCCA 3780 

CTCTGCTGGG AGTGCTGATG CTGGCGGGAT CTGAGTTCGA CTTCATTGTC AGGTATTTCT 3840 

TTGCTGTGCT GGCGATCCTC ACCATCCTCG GCGTTCTCAA TGGGCTCCTT TTGCTTCCCG 3900 

TGCTTTTGTC TTTCTTTGGA CCATATCCTG AGGTGTCTCC AGCCAACGGC TTGAACCGCC 3960 

TGCCCACACC CTCCCCTGAG CCACCCCCCA GCGTGGTCCG CTTCGCCATG CCGCCCGGCC 4020 

ACACGCACAG CGGGTCTGAT TCCTCCGACT CGGAGTATAG TTCCCAGACG ACAGTGTCAG 4080 

GCCTCAGCGA GG AGCTTCGG CACTACGAGG CCCAGCAGGG CGCGGGAGGC CCTGCCCACC 4140 

AAGTGATCGT GG AAGCCACA GAAAACCCCG TCTTCGCCCA CTCCACTGTG GTCCATCCCG 4200 
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AATCCAGGCA TCACCCACCC TCGAACCCGA GACAGCAGCC CCACCTGGAC TCAGGGTCCC 
TGCCTCCCGG ACGGCAAGGC CAGCAGCCCC GCAGGGACCC CCCCAGACAA GGCTTGTGGC 
CACCCCTCTA CAGACCCCGC AGAGACGCTT TTGAAATTTC TACTGAACGG CATTCTGGCC 
CTAGCAATAG GGCCCGCTGG GGCCCTCGCG GGGCCCGTTC TCACAACCCT CGGAACCCAG 
CGTCCACTGC CATGGGCAGC TCCGTGCCCG GCTACTGCCA GCCCATCACC ACTGTGACGG 
CTTCTGCCTC CGTGACTGTC GCCGTGCACC CGCCGCCTGT CCCTGGGCCT GGGCGGAACC 
CCCGAGGGGG ACTCTGCCCA GGCTACCCTG AGACTGACCA CGGCCTGTTT GAGGACCCCC 
ACGTGCCTTT CCACGTCCGG TGTGAGAGGA GGGATTCGAA GGTGGAAGTC ATTGAGCTGC 
AGGACGTGGA ATGCGAGGAG AGGCCCCGGG GAAGCAGCTC CAACTGAGGG TGATTAAAAT 
CTGAAGCAAA GAGGCCAAAG ATTGGAAACC CCCCACCCCC ACCTCTTTCC AGAACTGCTT 
GAAGAGAACT GGTTGGAGTT ATGGAAAAGA TGCCCTGTGC CAGGACAGCA GTTCATTGTT 
ACTGTAACCG ATTGTATTAT TTTGTTAAAT ATTTCTATAA ATATTTAAGA GATGTACACA 
TGTGTAATAT AGGAAGGAAG GATGTAAAGT GGTATGATCT GGGGCTTCTC CACTCCTGCC 
CCAGAGTGTG GAGGCCACAG TGGGGCCTCT CCGTATTTGT GCATTGGGCT CCGTCCCACA 
ACCAAGCTTC ATTAGTCTTA AATTTCAGCA TATGTTGCTG CTGCTTAAAT ATTGTATAAT 
TTACTTGTAT AATTCTATGC AAATATTGCT TATGTAATAG GATTATTTTG TAAAGGTTTC 
TGTTTAAAAT ATTTTAAATT TGCATATCAC AACCCTGTGG TAGTATGAAA TGTTACTGTT 
AACTTTCAAA CACGCTATGC GTGATAATTT TTTTGTTTAA TGAGCAGATA TGAAGAAAGC 
CCGGAATT 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1447 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Met Ala ser Ala Gly Asn Ala Ala 01. Pro Gin Asp Arg Gly Gly Gly 
Gly Ser Gly Cys He Gly Ala Pro Gly Arg Pro Ala Gly Gly Gly Arg 
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Arg Arg Arg Thr Gly Gly Leu Arg Arg Ala Ala Ala Pro Asp Arg Ago 
35 40 45 

Tyr Leu Hie Arg Pro Ser Tyr Cys Asp Ala Ala Phe Ala Leu Glu Gin 
50 55 60 

He Ser Lye Cly Lys Ala Thr Gly Arg Lys Ala Pro Leu Trp Leu Arg 

65 70 75 so 

Ala Lys Phe Gin Arg Leu Leu Phe Lys Leu Gly cys Tyr He Gin Lys 
85 90 95 

Ann Cys Gly Lys Phe Leu Val val Gly Leu Leu He Phe Gly Ala Phe 
100 105 no 

Ala val Gly Leu Lys Ala Ala Asn Leu Glu Thr Asn Val Glu Clu Leu 
I 15 120 125 

Trp Val Glu Val Gly Gly Arg Val Ser Arg Glu Leu Asn Tyr Thr Ara 
130 135 140 

Gin Lys He Gly Glu Glu Ala Met Phe Asn Pro Gin Leu Met He Gin 
145 150 155 160 

Thr Pro Lys Glu Glu Gly Ala Asn Val Leu Thr Thr Glu Ala Leu Leu 

165 170 175 

Gin His Leu Asp Ser Ala Leu Gin Ala Ser Arg Val His Val Tyr Met 
180 185 190 

Tyr Asn Arg Gin Trp Lys Leu Glu His Leu Cys Tyr Lys Ser Gly Glu 
I* 5 200 205 

Leu He Thr Glu Thr Gly Tyr Met Asp Gin He He Glu Tyr Leu Tvr 
210 215 220 

Pro Cys Leu He He Thr Pro Leu Asp Cys Phe Trp Glu Gly Ala Lys 
225 23 0 235 240 

Leu Gin Ser Gly Thr Ala Tyr Leu Leu Gly Lys Pro Pro Leu Arg Trp 
245 250 255 

Thr Asn Phe Asp Pro Leu Glu Phe Leu Glu Glu Leu Lys Lys He Asn 
260 265 270 

Tyr Gin Val Asp Ser Trp Glu Glu Met Leu Asn Lys Ala Glu Val Gly 
275 280 285 

His Gly Tyr Met Asp Arg Pro Cys Leu Asn Pro Ala Asp Pro Asp Cys 
290 295 300 

Pro Ala Thr Ala Pro Asn Lys Asn Ser Thr Lys Pro Leu Asp Met Ala 
305 3" 315 320 

Leu Val Leu Asn Cly Gly cys Hie Gly Leu Ser Arg Lys Tyr Met His 

325 330 335 

Trp Gin Glu Glu Leu He Val Gly Gly Thr Val Lys Asn Ser Thr Gly 
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345 350 
340 J4 * 



Ly8 Leu val Ser Ala Hi. Ala Leu Gin Thr Met Phe Gin Leu Met Thr 

355 360 

Pto L y. «. »« Tyr Cl« Ki. P« Lr. «y Tyr JU Tyr v., s.r Bi. 

370 375 

t„. Ala Ala Ala He Leu Glu Ala Trp Gin 
He Asn Trp Asn Glu Asp Lye Ala Aia Axa ^ 

385 390 3 " 



, v«i v»l Hio Gin Ser Val Ala Gin Asn Ser Thr 

Arg Thr Tyr Val Glu Val Val Hio Gin ^ 

«. ,y. V.l L.u S.r Ph. Thr Thr Thr Thr M u A,p A.p II. I« Ly. 

420 4 " 
s.r Ph. ser A.p v.! s.r v.! U. Ar, v.l .1. « «T Tyr L- — 

435 440 
Met Leu Ala Tyr Ala Cys Leu Thr Met Leu Arg Trp Asp Cys Ser Lys 
450 455 

S.r Sin Sly Al. V.! «, U. U. «» «» £ «~ ^ "» ~ £ 
465 4,0 

v.. Al. M. .1, 1- «T - - £ "* »' "* :» 

485 

Al. Al. Thr Thr cm v.l L.U Pro ph. « JO. L.u Cly v.l oly v.l 

500 505 

A.p M o v.1 Ph. L- « «• »i- « 01u SS "* 

515 520 

L ys Arg lie Pro Phe Glu Asp Arg Thr Gly Glu Cys Leu Lys Arg Thr 

530 535 

1, Ala S.r v.1 Al. I« Thr s.r .1. s.r »n v.l Thr Al. Ph. Ph. 

545 550 

„« ai. »i. a. » »• » ~ «■ ~ s " s; oln 

565 

Al. Al. V.1 V,l V.X V.1 Ph. » Ph. M« V.1 M £ XI. P>» 

580 585 
Pro Al. <U M S.r M.t A.p «. Tyr Ar, Ar, .U £ Ar, Ar, Lea 

595 600 
Asp lie Phe eye Cys Phe Thr Ser Pro Cys Val Ser Arg Val He Gin 

610 615 
Val Glu Pro Gin Ala Tyr Thr Asp Thr His Asp Asn Thr Arg Tyr Ser 
625 «° " 5 

« ser His Ser Phe Ala His Glu Thr Gin He 

Pro Pro Pro Pro Tyr Ser ser nw o« ^ 

645 650 
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Thr Met Gin Ser Thr Val Gin Leu Arg Thr Glu Tyr Aap Pro His Thr 
660 665 670 

His Val Tyr Tyr Thr Thr Ala Glu Pro Arg Ser Glu lie Ser Val Gin 
675 680 685 

Pro Val Thr Val Thr Gin Asp Thr Leu Ser Cys Gin Ser Pro Glu Ser 
690 695 700 

Thr Ser Ser Thr Arg Asp Leu Leu Ser Gin Phe Ser Asp Ser Ser Leu 

705 710 715 720 

His Cys Leu Glu Pro Pro Cys Thr Lys Trp Thr Leu Ser Ser Phe Ala 
725 730 735 

Glu Lys His Tyr Ala Pro Phe Leu Leu Lys Pro Lys Ala Lys Val Val 
740 745 750 

Val lie Phe Leu Phe Leu Gly Leu Leu Gly Val Ser Leu Tyr Gly Thr 
755 760 765 

Thr Arg Val Arg Asp Gly Leu Asp Leu Thr Asp lie Val Pro Arg Glu 
770 775 780 

Thr Arg Glu Tyr Asp Phe lie Ala Ala Gin Phe Lys Tyr Phe Ser Phe 
785 790 795 800 

Tyr Asn Met Tyr He Val Thr Gin Lys Ala Asp Tyr Pro Asn He Gin 
805 810 815 

His Leu Leu Tyr Asp Leu His Arg Ser Phe Ser Asn Val Lys Tyr Val 
820 825 830 

Met Leu Glu Glu Asn Lys Gin Leu Pro Lys Met Trp Leu His Tyr Phe 
835 840 845 

Arg Asp Trp Leu Gin Gly Leu Gin Asp Ala Phe Asp Ser Asp Trp Glu 
850 855 860 

Thr Gly Lys He Met Pro Asn Asn Tyr Lys Asn Gly Ser Asp Asp Gly 
865 870 875 880 

Val Leu Ala Tyr Lys Leu Leu Val Gin Thr Gly Ser Arg Asp Lys Pro 
885 890 895 

He Asp He Ser Gin Leu Thr Lys Gin Arg Leu Val Asp Ala Asp Gly 
900 905 910 

He He Asn Pro Ser Ala Phe Tyr He Tyr Leu Thr Ala Trp Val Ser 
915 920 925 

Asn Asp Pro Val Ala Tyr Ala Ala Ser Gin Ala Asn He Arg Pro His 
930 935 940 

Arg Pro Glu Trp Val His Asp Lys Ala Asp Tyr Met Pro Glu Thr Arg 
945 950 955 960 



Leu Arg He Pro Ala Ala Glu Pro He Glu Tyr Ala Gin Phe Pro Phe 
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965 



970 975 



Tyr Leu Asn Gly Leu Arg Asp Thr Ser Asp Phe Val Glu Ala lie Olu 
980 985 990 

Lys Val Arg Thr lie Cya Ser Asn Tyr Thr Ser Leu Gly Leu Ser Ser 
* 995 1000 1005 

Tvr Pro Aen Gly Tyr Pro Phe Leu Phe Trp Glu Gin Tyr lie Gly Leu 
1010 1015 1020 

Arg His Trp Leu Leu Leu Phe He Ser Val Val Leu Ala Cy. Thr Phe 
1025 1° 30 1035 

Leu Val Cys Ala Val Phe Leu Leu Asn Pro Trp Thr Ala Gly lie lie 
1045 1050 1055 

Val Met Val Leu Ala Leu Met Thr Val Glu Leu Phe Gly Met Met Gly 
1060 1° 65 10 

Leu He Gly He Lye Leu Ser Ala Val Pro Val Val lie Leu lie Ala 
1075 1° 80 1085 

ser val Gly lie Gly Val Glu Phe Thr Val His Val Ala Leu Ala Phe 

1090 1° 95 1100 

Leu Thr Ala He Gly Aap Lys Aen Arg Arg Ala Val Leu Ala Leu Glu 
1105 111° 1115 

His Met Phe Ala Pro Val Leu Asp Gly Ala Val Ser Thr Leu Leu Gly 
H25 1130 H3b 

Val Leu Met Leu Ala Gly Ser Glu Phe Asp Phe He Val Arg Tyr Phe 
1140 1145 1150 

Phe Ala Val Leu Ala He Leu Thr He Leu Gly Val Leu Asn Gly Leu 
1155 ll 60 1165 

Val Leu Leu Pro Val Leu Leu Ser Phe Phe Gly Pro Tyr Pro Clu Val 
1170 1"5 " 80 

Ser Pro Ala Asn Gly Leu Asn Arg Leu Pro Thr Pro Ser Pro Glu Pro 
1185 ll 90 1195 

Pro Pro Ser Val Val Arg Phe Ala Met Pro Pro Gly His Thr His Ser 
1205 121° 1215 

Gly Ser Asp Ser Ser Asp Ser Glu Tyr Ser Ser Gin Thr Thr Val Ser 
1220 1225 1230 

Glv Leu Ser Glu Glu Leu Arg His Tyr Glu Ala Gin Gin Gly Ala Gly 
1235 1240 12« 

Glv Pro Ala His Gin Val He Val Glu Ala Thr Glu Aen Pro Val Phe 
1250 1255 "60 

Ala His Ser Thr Val Val His Pro Glu Ser Arg His His Pro Pro S«r 
1265 I 270 1275 
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Asn Pro Arg Gin Gin Pro His Leu Asp Ser Gly Ser Leu Pro Pro Gly 
1285 1290 1295 

Arg Gin Gly Gin Gin Pro Arg Arg Asp Pro Pro Arg Glu Gly Leu Trp 
1300 1305 1310 

Pro Pro Leu Tyr Arg Pro Arg Arg Asp Ala Phe Glu lie Ser Thr Glu 
1315 1320 1325 

Gly His Ser Gly Pro Ser Asn Arg Ala Arg Trp Gly Pro Arg Gly Ala 
1330 1335 1340 

Arg Ser His Asn Pro Arg Asn Pro Ala Ser Thr Ala Met Gly Ser Ser 
1345 1350 1355 1360 

Val Pro Gly Tyr Cys Gin Pro lie Thr Thr Val Thr Ala Ser Ala Ser 
1365 1370 1375 

Val Thr Val Ala Val His Pro Pro Pro Val Pro Gly Pro Gly Arg Asn 
1380 1385 1390 

Pro Arg Gly Gly Leu Cys Pro Gly Tyr Pro Glu Thr Asp His Gly Leu 
1395 1400 1405 

Phe Glu Asp Pro His Val Pro Phe His Val Arg Cys Glu Arg Arg Asp 
1410 1415 1420 

Ser Lys Val Glu Val He Glu Leu Gin Asp Val Glu Cys Glu Glu Arg 
1425 1430 1435 1440 

Pro Arg Gly Ser Ser Ser Asn 
1445 
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1 . A DNA sequence other than present in a chromosome encoding a patched gene 
other than the Drosophila patched gene or fragment thereof of at least about 12bp 

5 different from the sequence of the Drosophila patched gene. 

2. A DNA sequence according to Claim 1, wherein said patched gene is a 
mammalian gene. 

10 3. A DNA sequence according to Claim 1 for human, mouse, mosquito, butterfly 
or beetle patched gene. 

4. A DNA sequence according to Claim 3, wherein said DNA sequence is a 
human sequence. 

15 

5. A DNA sequence according to Claim 4, wherein said DNA sequence is a 
mouse sequence. 

6. A DNA sequence according to Claim 1, wherein said DNA sequence is a 
20 fragment of at least about 18bp. 

7. A DNA sequence according to Claim 1 joined to a DNA sequence comprising 
a restriction enzyme recognition sequence. 

25 8. An expression cassette comprising a transcriptional initiation region functional 
in an expression host, a DNA sequence according to Claim 1 under the 
transcriptional regulation of said transcriptional initiation region, and a 
transcriptional termination region functional in said expression host. 

30 9. An expression cassette according to Claim 8, wherein said transcriptional 
initiation region is heterologous to said DNA sequence according to Claim 1. 
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10. An expression cassette according to Claim 8, wherein said transcriptional 
initiation region is homologous to said DNA sequence according to Claim 1 and 
includes the enhancer region. 

5 11. A cell comprising an expression cassette according to Claim 8 as part of an 
extrachromosomal element or integrated into the genome of a host cell as a result of 
introduction of said expression cassette into said host cell and the cellular progeny of 
said host cell. 

10 12. A cell according to Claim 1 1 , further comprising the patched protein in the 
cellular membrane of said cell. 

13. A cell according to Claim 11, wherein said patched protein is a mouse patched 
protein. 

15 

14. A cell according to Claim 1 1, wherein said patched gene is a human patched 
protein. 

15. A cell according to Claim 11, wherein said transcriptional initiation region is a 
20 Drosophila patched gene transcriptional initiation region comprising the promoter 

and enhancer joined to a heterologous gene. 

16. A cell comprising an expression cassette comprising a transcriptional initiation 
region functional in an expression host, said transcriptional initiation region 

25 consisting of a 5' non-coding region regulating the transcription of patched protein 
comprising the promoter and enhancer, a marker gene, and a transcriptional 
termination region, as part of an extrachromosomal element or integrated into the 
genome of a host cell as a result of introduction of said expression cassette into said 
host, and the cellular progeny thereof. 

30 
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17. A cell according to Claim 16, wherein said transcriptional initiation region is 
the Drosophila region. 

18. A method for following embryonic development employing tot patched 
5 protein in an embryo, said method comprising: 

integrating an expression cassette comprising a transcriptional initiation region 
functional in embryonic host cells, said transcriptional initiation region consisting of 
a 5' non-coding region regulating the transcription of patched protein, a marker 
gene, and a transcriptional termination region, wherein said embryonic host cells are 

10 capable of developing into a fetus; 

growing said embryonic host cells, whereby proliferation and differentiation 

occur; and 

locating cells comprising expression of the patched protein by means of 
expression of said marker gene. 



15 



19. A method for producing patched protein, said method comprising: 
growing a cell according to Claim 1 1 , whereby said patched protein is 

expressed; and 

isolating said patched protein free of other proteins. 



20 



20. A method for screening candidate compounds for binding affinity to the 
patched protein, said method comprising: 

combining said candidate protein with a vertebrate or invertebrate cell 
comprising said patched protein in the membrane of said cell and an expression 
25 cassette comprising a transcriptional initiation region functional in said cell, a DNA 
sequence according to Claim 1 comprising the entire coding sequence under the 
transcriptional regulation of said transcriptional initiation region, and a 
transcriptional termination region functional in said cell, expressing patched 

protein in said cell; and 
30 assaying for the binding of said candidate compound to said patched protein. 
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21. A method for screening candidate compounds for agonist activity with the 
patched protein, said method comprising: 

combining said candidate protein with a vertebrate or invertebrate cell 
comprising said patched protein in the membrane of said cell and an expression 
5 cassette comprising a transcriptional initiation region functional in an expression 
host, said transcriptional initiation region consisting of a 5 1 non-coding region 
regulating the transcription of patched protein, a marker gene, and a transcriptional 
termination region, as part of an extrachromosomal element or integrated into the 
genome of a host cell; and 
10 assaying for the expression of said marker gene. 

22. A monoclonal antibody binding specifically to a patched protein, other than 
the Drosophila patched protein. 
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